What's wrong with this hash? - hash

What's wrong with this hash function? Specifically, what will happen when N=127 is passed in?
int hash3(char *k, int N)
{
char *c; int h = 0;
for (c = k; *c != '\0'; c++) {
h = h | *c;
}
return (h % N);
}
This was a question that came up in a practice exam (with no solutions unfortunately). As I understand it, the function is using bitwise or to convert a string into an integer, and place it in a table of size N, but I don't really know why it would be faulty? Thanks in advance.

Monotonous functions like bitwise OR is not suitable for calculating hash.
As you know, OR operator "|" work as follows.
0 | 0 = 0
0 | 1 = 1
1 | 0 = 1
1 | 1 = 1
Number of 1's never been deceased, so OR operation is monotonous incremental function.
If OR operation applied repeatedly to many data, the result may become '1' in high possibility.
If you apply long sequence of characters to your code, you may have 127 ('1111111' in binary) as result frequently or almost of time.
bitwise AND is not suitable also, because AND is monotonous decremental function.
XOR is most suitable bitwise operation for calculating Hash.

Related

How can I filter my array of numbers in Matlab/Octave?

I have a very trivial example where I'm trying to filter by matching a String:
A = [0:1:999];
B = A(int2str(A) == '999');
This
A(A > 990);
works
This
int2str(5) == '5'
also works
I just can't figure out why I cannot put the two together. I get an error about nonconformant arguments.
int2str(A) produces a very long char array (of size 1 x 4996) containing the string representations of all those numbers (including spacing) appended together end to end.
int2str(A) == '999'
So, in the statement above, you're trying to compare a matrix of size 1 x 4996 with another of size 1 x 3. This, of course, fails as the two either need to be of the same size, or at least one needs to be a scalar, in which case scalar expansion rules apply.
A(A > 990);
The above works because of logical indexing rules, the result will be the elements from the indices of A for which that condition holds true.
int2str(5) == '5'
This only works because the result of the int2str call is a 1 x 1 matrix ('5') and you're comparing it to another matrix of the same size. Try int2str(555) == '55' and it'll fail with the same error as above.
I'm not sure what result you expected from the original statements, but maybe you're looking for this:
A = [0:1:999];
B = int2str(A(A == 999)) % outputs '999'
I am not sure that the int2str() conversion is what you are looking for. (Also, why do you need to convert numbers to strings and then carry out a char comparison?)
Suppose you have a simpler case:
A = 1:3;
strA = int2str(A)
strA =
1 2 3
Note that this is a 1x7 char array. Thus, comparing it against a scalar char:
strA == '2'
ans =
0 0 0 1 0 0 0
Now, you might wanna transpose A and carry out the comparison:
int2str(A')=='2'
ans =
0
1
0
however, this approach will not work if the number of digits of each number is not the same because lower numbers will be padded with spaces (try creating A = 1:10 and comparing against '2').
Then, create a cell array of string without whitespaces and use strcmp():
csA = arrayfun(#int2str,A','un',0)
csA =
'1'
'2'
'3'
strcmp('2',csA)
Should be much faster, and correct to turn the string into a number, than the other way around. Try
B = A(A == str2double ('999'));

Reading character by character from a string into an array

What is the usual way in MATLAB to read an integer into an array, digit by digit?
I'm trying to split a four digit integer, 1234 into an array [1 2 3 4].
Here is a very easy way to do it for a single integer
s = num2str(1234)
for t=length(s):-1:1
result(t) = str2num(s(t));
end
The most compact way however, would be:
'1234'-'0'
Or try this
result = str2num(num2str(1234)')'
You can use arrayfun
arrayfun(#str2num, num2str(x))
Here is an elegant and efficient solution using a recursive function:
function d = int2dig(n)
if n >= 10
d = [int2dig(floor(n/10)),mod(n,10)];
else
d = n;
end

How do I determine if *exactly* one boolean is true, without type conversion?

Given an arbitrary list of booleans, what is the most elegant way of determining that exactly one of them is true?
The most obvious hack is type conversion: converting them to 0 for false and 1 for true and then summing them, and returning sum == 1.
I'd like to know if there is a way to do this without converting them to ints, actually using boolean logic.
(This seems like it should be trivial, idk, long week)
Edit: In case it wasn't obvious, this is more of a code-golf / theoretical question. I'm not fussed about using type conversion / int addition in PROD code, I'm just interested if there is way of doing it without that.
Edit2: Sorry folks it's a long week and I'm not explaining myself well. Let me try this:
In boolean logic, ANDing a collection of booleans is true if all of the booleans are true, ORing the collection is true if least one of them is true. Is there a logical construct that will be true if exactly one boolean is true? XOR is this for a collection of two booleans for example, but any more than that and it falls over.
You can actually accomplish this using only boolean logic, although there's perhaps no practical value of that in your example. The boolean version is much more involved than simply counting the number of true values.
Anyway, for the sake of satisfying intellectual curiosity, here goes. First, the idea of using a series of XORs is good, but it only gets us half way. For any two variables x and y,
x ⊻ y
is true whenever exactly one of them is true. However, this does not continue to be true if you add a third variable z,
x ⊻ y ⊻ z
The first part, x ⊻ y, is still true if exactly one of x and y is true. If either x or y is true, then z needs to be false for the whole expression to be true, which is what we want. But consider what happens if both x and y are true. Then x ⊻ y is false, yet the whole expression can become true if z is true as well. So either one variable or all three must be true. In general, if you have a statement that is a chain of XORs, it will be true if an uneven number of variables are true.
Since one is an uneven number, this might prove useful. Of course, checking for an uneven number of truths is not enough. We additionally need to ensure that no more than one variable is true. This can be done in a pairwise fashion by taking all pairs of two variables and checking that they are not both true. Taken together these two conditions ensure that exactly one if the variables are true.
Below is a small Python script to illustrate the approach.
from itertools import product
print("x|y|z|only_one_is_true")
print("======================")
for x, y, z in product([True, False], repeat=3):
uneven_number_is_true = x ^ y ^ z
max_one_is_true = (not (x and y)) and (not (x and z)) and (not (y and z))
only_one_is_true = uneven_number_is_true and max_one_is_true
print(int(x), int(y), int(z), only_one_is_true)
And here's the output.
x|y|z|only_one_is_true
======================
1 1 1 False
1 1 0 False
1 0 1 False
1 0 0 True
0 1 1 False
0 1 0 True
0 0 1 True
0 0 0 False
Sure, you could do something like this (pseudocode, since you didn't mention language):
found = false;
alreadyFound = false;
for (boolean in booleans):
if (boolean):
found = true;
if (alreadyFound):
found = false;
break;
else:
alreadyFound = true;
return found;
After your clarification, here it is with no integers.
bool IsExactlyOneBooleanTrue( bool *boolAry, int size )
{
bool areAnyTrue = false;
bool areTwoTrue = false;
for(int i = 0; (!areTwoTrue) && (i < size); i++) {
areTwoTrue = (areAnyTrue && boolAry[i]);
areAnyTrue |= boolAry[i];
}
return ((areAnyTrue) && (!areTwoTrue));
}
No-one mentioned that this "operation" we're looking for is shortcut-able similarly to boolean AND and OR in most languages. Here's an implementation in Java:
public static boolean exactlyOneOf(boolean... inputs) {
boolean foundAtLeastOne = false;
for (boolean bool : inputs) {
if (bool) {
if (foundAtLeastOne) {
// found a second one that's also true, shortcut like && and ||
return false;
}
foundAtLeastOne = true;
}
}
// we're happy if we found one, but if none found that's less than one
return foundAtLeastOne;
}
With plain boolean logic, it may not be possible to achieve what you want. Because what you are asking for is a truth evaluation not just based on the truth values but also on additional information(count in this case). But boolean evaluation is binary logic, it cannot depend on anything else but on the operands themselves. And there is no way to reverse engineer to find the operands given a truth value because there can be four possible combinations of operands but only two results. Given a false, can you tell if it is because of F ^ F or T ^ T in your case, so that the next evaluation can be determined based on that?.
booleanList.Where(y => y).Count() == 1;
Due to the large number of reads by now, here comes a quick clean up and additional information.
Option 1:
Ask if only the first variable is true, or only the second one, ..., or only the n-th variable.
x1 & !x2 & ... & !xn |
!x1 & x2 & ... & !xn |
...
!x1 & !x2 & ... & xn
This approach scales in O(n^2), the evaluation stops after the first positive match is found. Hence, preferred if it is likely that there is a positive match.
Option 2:
Ask if there is at least one variable true in total. Additionally check every pair to contain at most one true variable (Anders Johannsen's answer)
(x1 | x2 | ... | xn) &
(!x1 | !x2) &
...
(!x1 | !xn) &
(!x2 | !x3) &
...
(!x2 | !xn) &
...
This option also scales in O(n^2) due to the number of possible pairs. Lazy evaluation stops the formula after the first counter example. Hence, it is preferred if its likely there is a negative match.
(Option 3):
This option involves a subtraction and is thus no valid answer for the restricted setting. Nevertheless, it argues how looping the values might not be the most beneficial solution in an unrestricted stetting.
Treat x1 ... xn as a binary number x. Subtract one, then AND the results. The output is zero <=> x1 ... xn contains at most one true value. (the old "check power of two" algorithm)
x 00010000
x-1 00001111
AND 00000000
If the bits are already stored in such a bitboard, this might be beneficial over looping. Though, keep in mind this kills the readability and is limited by the available board length.
A last note to raise awareness: by now there exists a stack exchange called computer science which is exactly intended for this type of algorithmic questions
It can be done quite nicely with recursion, e.g. in Haskell
-- there isn't exactly one true element in the empty list
oneTrue [] = False
-- if the list starts with False, discard it
oneTrue (False : xs) = oneTrue xs
-- if the list starts with True, all other elements must be False
oneTrue (True : xs) = not (or xs)
// Javascript
Use .filter() on array and check the length of the new array.
// Example using array
isExactly1BooleanTrue(boolean:boolean[]) {
return booleans.filter(value => value === true).length === 1;
}
// Example using ...booleans
isExactly1BooleanTrue(...booleans) {
return booleans.filter(value => value === true).length === 1;
}
One way to do it is to perform pairwise AND and then check if any of the pairwise comparisons returned true with chained OR. In python I would implement it using
from itertools import combinations
def one_true(bools):
pairwise_comp = [comb[0] and comb[1] for comb in combinations(bools, 2)]
return not any(pairwise_comp)
This approach easily generalizes to lists of arbitrary length, although for very long lists, the number of possible pairs grows very quickly.
Python:
boolean_list.count(True) == 1
OK, another try. Call the different booleans b[i], and call a slice of them (a range of the array) b[i .. j]. Define functions none(b[i .. j]) and just_one(b[i .. j]) (can substitute the recursive definitions to get explicit formulas if required). We have, using C notation for logical operations (&& is and, || is or, ^ for xor (not really in C), ! is not):
none(b[i .. i + 1]) ~~> !b[i] && !b[i + 1]
just_one(b[i .. i + 1]) ~~> b[i] ^ b[i + 1]
And then recursively:
none(b[i .. j + 1]) ~~> none(b[i .. j]) && !b[j + 1]
just_one(b[i .. j + 1] ~~> (just_one(b[i .. j]) && !b[j + 1]) ^ (none(b[i .. j]) && b[j + 1])
And you are interested in just_one(b[1 .. n]).
The expressions will turn out horrible.
Have fun!
That python script does the job nicely. Here's the one-liner it uses:
((x ∨ (y ∨ z)) ∧ (¬(x ∧ y) ∧ (¬(z ∧ x) ∧ ¬(y ∧ z))))
Retracted for Privacy and Anders Johannsen provided already correct and simple answers. But both solutions do not scale very well (O(n^2)). If performance is important you can stick to the following solution, which performs in O(n):
def exact_one_of(array_of_bool):
exact_one = more_than_one = False
for array_elem in array_of_bool:
more_than_one = (exact_one and array_elem) or more_than_one
exact_one = (exact_one ^ array_elem) and (not more_than_one)
return exact_one
(I used python and a for loop for simplicity. But of course this loop could be unrolled to a sequence of NOT, AND, OR and XOR operations)
It works by tracking two states per boolean variable/list entry:
is there exactly one "True" from the beginning of the list until this entry?
are there more than one "True" from the beginning of the list until this entry?
The states of a list entry can be simply derived from the previous states and corresponding list entry/boolean variable.
Python:
let see using example...
steps:
below function exactly_one_topping takes three parameter
stores their values in the list as True, False
Check whether there exists only one true value by checking the count to be exact 1.
def exactly_one_topping(ketchup, mustard, onion):
args = [ketchup,mustard,onion]
if args.count(True) == 1: # check if Exactly one value is True
return True
else:
return False
How do you want to count how many are true without, you know, counting? Sure, you could do something messy like (C syntax, my Python is horrible):
for(i = 0; i < last && !booleans[i]; i++)
;
if(i == last)
return 0; /* No true one found */
/* We have a true one, check there isn't another */
for(i++; i < last && !booleans[i]; i++)
;
if(i == last)
return 1; /* No more true ones */
else
return 0; /* Found another true */
I'm sure you'll agree that the win (if any) is slight, and the readability is bad.
It is not possible without looping. Check BitSet cardinality() in java implementation.
http://fuseyism.com/classpath/doc/java/util/BitSet-source.html
We can do it this way:-
if (A=true or B=true)and(not(A=true and B=true)) then
<enter statements>
end if

Is there a way to make Matlab ints behave more like C++ ints? [duplicate]

I'm working on a verification-tool for some VHDL-Code in MATLAB/Octave. Therefore I need data types which generate "real" overflows:
intmax('int32') + 1
ans = -2147483648
Later on, it would be helpful if I can define the bit width of a variable, but that is not so important right now.
When I build a C-like example, where a variable gets increased until it's smaller than zero, it spins forever and ever:
test = int32(2^30);
while (test > 0)
test = test + int32(1);
end
Another approach I tried was a custom "overflow"-routine which was called every time after a number is changed. This approach was painfully slow, not practicable and not working in all cases at all. Any suggestions?
In MATLAB, one option you have is to overload the methods that handle arithmetic operations for integer data types, creating your own custom overflow behavior that will result in a "wrap-around" of the integer value. As stated in the documentation:
You can define or overload your own
methods for int* (as you can for any
object) by placing the appropriately
named method in an #int* folder within
a folder on your path. Type help
datatypes for the names of the methods
you can overload.
This page of the documentation lists the equivalent methods for the arithmetic operators. The binary addition operation A+B is actually handled by the function plus(A,B). Therefore, you can create a folder called #int32 (placed in another folder on your MATLAB path) and put a function plus.m in there that will be used instead of the built-in method for int32 data types.
Here's an example of how you could design your overloaded plus function in order to create the overflow/underflow behavior you want:
function C = plus(A,B)
%# NOTE: This code sample is designed to work for scalar values of
%# the inputs. If one or more of the inputs is non-scalar,
%# the code below will need to be vectorized to accommodate,
%# and error checking of the input sizes will be needed.
if (A > 0) && (B > (intmax-A)) %# An overflow condition
C = builtin('plus',intmin,...
B-(intmax-A)-1); %# Wraps around to negative
elseif (A < 0) && (B < (intmin-A)) %# An underflow condition
C = builtin('plus',intmax,...
B-(intmin-A-1)); %# Wraps around to positive
else
C = builtin('plus',A,B); %# No problems; call the built-in plus.m
end
end
Notice that I call the built-in plus method (using the BUILTIN function) to perform addition of int32 values that I know will not suffer overflow/underflow problems. If I were to instead perform the integer addition using the operation A+B it would result in a recursive call to my overloaded plus method, which could lead to additional computational overhead or (in the worst-case scenario where the last line was C = A+B;) infinite recursion.
Here's a test, showing the wrap-around overflow behavior in action:
>> A = int32(2147483642); %# A value close to INTMAX
>> for i = 1:10, A = A+1; disp(A); end
2147483643
2147483644
2147483645
2147483646
2147483647 %# INTMAX
-2147483648 %# INTMIN
-2147483647
-2147483646
-2147483645
-2147483644
If you want to get C style numeric operations, you can use a MEX function to call the C operators directly, and by definition they'll work like C data types.
This method is a lot more work than gnovice's overrides, but it should integrate better into a large codebase and is safer than altering the definition for built-in types, so I think it should be mentioned for completeness.
Here's a MEX file which performs the C "+" operation on a Matlab array. Make one of these for each operator you want C-style behavior on.
/* c_plus.c - MEX function: C-style (not Matlab-style) "+" operation */
#include "mex.h"
#include "matrix.h"
#include <stdio.h>
void mexFunction(
int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[]
)
{
mxArray *out;
/* In production code, input/output type and bounds checks would go here. */
const mxArray *a = prhs[0];
const mxArray *b = prhs[1];
int i, n;
int *a_int32, *b_int32, *out_int32;
short *a_int16, *b_int16, *out_int16;
mxClassID datatype = mxGetClassID(a);
int n_a = mxGetNumberOfElements(a);
int n_b = mxGetNumberOfElements(b);
int a_is_scalar = n_a == 1;
int b_is_scalar = n_b == 1;
n = n_a >= n_b ? n_a : n_b;
out = mxCreateNumericArray(mxGetNumberOfDimensions(a), mxGetDimensions(a),
datatype, mxIsComplex(a));
switch (datatype) {
case mxINT32_CLASS:
a_int32 = (int*) mxGetData(a);
b_int32 = (int*) mxGetData(b);
out_int32 = (int*) mxGetData(out);
for (i=0; i<n; i++) {
if (a_is_scalar) {
out_int32[i] = a_int32[i] + b_int32[i];
} else if (b_is_scalar) {
out_int32[i] = a_int32[i] + b_int32[0];
} else {
out_int32[i] = a_int32[i] + b_int32[i];
}
}
break;
case mxINT16_CLASS:
a_int16 = (short*) mxGetData(a);
b_int16 = (short*) mxGetData(b);
out_int16 = (short*) mxGetData(out);
for (i=0; i<n; i++) {
if (a_is_scalar) {
out_int16[i] = a_int16[0] + b_int16[i];
} else if (b_is_scalar) {
out_int16[i] = a_int16[i] + b_int16[0];
} else {
out_int16[i] = a_int16[i] + b_int16[i];
}
}
break;
/* Yes, you'd have to add a separate case for every numeric mxClassID... */
/* In C++ you could do it with a template. */
default:
mexErrMsgTxt("Unsupported array type");
break;
}
plhs[0] = out;
}
Then you have to figure out how to invoke it from your Matlab code. If you're writing all the code, you could just call "c_plus(a, b)" instead of "a + b" everywhere. Alternately, you could create your own numeric wrapper class, e.g. #cnumeric, that holds a Matlab numeric array in its field and defines plus() and other operations that invoke the approprate C style MEX function.
classdef cnumeric
properties
x % the underlying Matlab numeric array
end
methods
function obj = cnumeric(x)
obj.x = x;
end
function out = plus(a,b)
[a,b] = promote(a, b); % for convenience, and to mimic Matlab implicit promotion
if ~isequal(class(a.x), class(b.x))
error('inputs must have same wrapped type');
end
out_x = c_plus(a.x, b.x);
out = cnumeric(out_x);
end
% You'd have to define the math operations that you want normal
% Matlab behavior on, too
function out = minus(a,b)
[a,b] = promote(a, b);
out = cnumeric(a.x - b.x);
end
function display(obj)
fprintf('%s = \ncnumeric: %s\n', inputname(1), num2str(obj.x));
end
function [a,b] = promote(a,b)
%PROMOTE Implicit promotion of numeric to cnumeric and doubles to int
if isnumeric(a); a = cnumeric(a); end
if isnumeric(b); b = cnumeric(b); end
if isinteger(a.x) && isa(b.x, 'double')
b.x = cast(b.x, class(a.x));
end
if isinteger(b.x) && isa(a.x, 'double')
a.x = cast(a.x, class(b.x));
end
end
end
end
Then wrap your numbers in the #cnumeric where you want C-style int behavior and do math with them.
>> cnumeric(int32(intmax))
ans =
cnumeric: 2147483647
>> cnumeric(int32(intmax)) - 1
ans =
cnumeric: 2147483646
>> cnumeric(int32(intmax)) + 1
ans =
cnumeric: -2147483648
>> cnumeric(int16(intmax('int16')))
ans =
cnumeric: 32767
>> cnumeric(int16(intmax('int16'))) + 1
ans =
cnumeric: -32768
There's your C-style overflow behavior, isolated from breaking the primitive #int32 type. Plus, you can pass a #cnumeric object in to other functions that are expecting regular numerics and it'll "work" as long as they treat their inputs polymorphically.
Performance caveat: because this is an object, + will have the slower speed of a method dispatch instead of a builtin. If you have few calls on large arrays, this'll be fast, because the actual numeric operations are in C. Lots of calls on small arrays, could slow things down, because you're paying the per method call overhead a lot.
I ran the following snippet of code
test = int32(2^31-12);
for i = 1:24
test = test + int32(1)
end
with unexpected results. It seems that, for Matlab, intmax('int32')+1==intmax('int32'). I'm running 2010a on a 64-bit Mac OS X.
Not sure that this as an answer, more confirmation that Matlab behaves counterintuitively. However, the documentation for the intmax() function states:
Any value larger than the value returned by intmax saturates to the intmax value when cast to a 32-bit integer.
So I guess Matlab is behaving as documented.
Hm, yes...
Actually, I was able to solve the problem with my custom "overflow"-Subroutine... Now it runs painfully slow, but without unexpected behaviour! My mistake was a missing round(), since Matlab/Octave will introduce small errors.
But if someone knows a faster solution, I would be glad to try it!
function ret = overflow_sg(arg,bw)
% remove possible rounding errors, and prepare returnvalue (if number is inside boundaries, nothing will happen)
ret = round(arg);
argsize = size(ret);
for i = 1:argsize(1)
for j = 1:argsize(2)
ret(i,j) = flow_sg(ret(i,j),bw);
end
end
end%function
%---
function ret = flow_sg(arg,bw)
ret = arg;
while (ret < (-2^(bw-1)))
ret = ret + 2^bw;
end
% Check for overflows:
while (ret > (2^(bw-1)-1))
ret = ret - 2^bw;
end
end%function
If 64 bits is enough to not overflow, and you need a lot of these, perhaps do this:
function ret = overflow_sg(arg,bw)
mask = int64(0);
for i=1:round(bw)
mask = bitset(mask,i);
end
topbit = bitshift(int64(1),round(bw-1));
subfrom = double(bitshift(topbit,1))
ret = bitand( int64(arg) , mask );
i = (ret >= topbit);
ret(i) = int64(double(ret(i))-subfrom);
if (bw<=32)
ret = int32(ret);
end
end
Almost everything is done as a matrix calculation, and a lot is done with bits, and everything is done in one step (no while loops), so it should be pretty fast. If you're going to populate it with rand, subtract 0.5 since it assumes it should round to integer values (rather than truncate).
I'm not a Java expert, but underlying Java classes available in Matlab should allow handling of overflows like C would. One solution I found, works only for single value, but it converts a number to the int16 (Short) or int32 (Integer) representation. You must do your math using Matlab double, then convert to Java int16 or int32, then convert back to Matlab double. Unfortunately Java doesn't appear to support unsigned types in this way, only signed.
double(java.lang.Short(hex2dec('7FFF')))
<br>ans = 32767
double(java.lang.Short(hex2dec('7FFF')+1))
<br>ans = -32768
double(java.lang.Short(double(intmax('int16'))+1))
<br>ans = -32768
double(java.lang.Integer(hex2dec('7FFF')+1))
<br>ans = 32768
https://www.tutorialspoint.com/java/lang/java_lang_integer.htm

How do I get real integer overflows in MATLAB/Octave?

I'm working on a verification-tool for some VHDL-Code in MATLAB/Octave. Therefore I need data types which generate "real" overflows:
intmax('int32') + 1
ans = -2147483648
Later on, it would be helpful if I can define the bit width of a variable, but that is not so important right now.
When I build a C-like example, where a variable gets increased until it's smaller than zero, it spins forever and ever:
test = int32(2^30);
while (test > 0)
test = test + int32(1);
end
Another approach I tried was a custom "overflow"-routine which was called every time after a number is changed. This approach was painfully slow, not practicable and not working in all cases at all. Any suggestions?
In MATLAB, one option you have is to overload the methods that handle arithmetic operations for integer data types, creating your own custom overflow behavior that will result in a "wrap-around" of the integer value. As stated in the documentation:
You can define or overload your own
methods for int* (as you can for any
object) by placing the appropriately
named method in an #int* folder within
a folder on your path. Type help
datatypes for the names of the methods
you can overload.
This page of the documentation lists the equivalent methods for the arithmetic operators. The binary addition operation A+B is actually handled by the function plus(A,B). Therefore, you can create a folder called #int32 (placed in another folder on your MATLAB path) and put a function plus.m in there that will be used instead of the built-in method for int32 data types.
Here's an example of how you could design your overloaded plus function in order to create the overflow/underflow behavior you want:
function C = plus(A,B)
%# NOTE: This code sample is designed to work for scalar values of
%# the inputs. If one or more of the inputs is non-scalar,
%# the code below will need to be vectorized to accommodate,
%# and error checking of the input sizes will be needed.
if (A > 0) && (B > (intmax-A)) %# An overflow condition
C = builtin('plus',intmin,...
B-(intmax-A)-1); %# Wraps around to negative
elseif (A < 0) && (B < (intmin-A)) %# An underflow condition
C = builtin('plus',intmax,...
B-(intmin-A-1)); %# Wraps around to positive
else
C = builtin('plus',A,B); %# No problems; call the built-in plus.m
end
end
Notice that I call the built-in plus method (using the BUILTIN function) to perform addition of int32 values that I know will not suffer overflow/underflow problems. If I were to instead perform the integer addition using the operation A+B it would result in a recursive call to my overloaded plus method, which could lead to additional computational overhead or (in the worst-case scenario where the last line was C = A+B;) infinite recursion.
Here's a test, showing the wrap-around overflow behavior in action:
>> A = int32(2147483642); %# A value close to INTMAX
>> for i = 1:10, A = A+1; disp(A); end
2147483643
2147483644
2147483645
2147483646
2147483647 %# INTMAX
-2147483648 %# INTMIN
-2147483647
-2147483646
-2147483645
-2147483644
If you want to get C style numeric operations, you can use a MEX function to call the C operators directly, and by definition they'll work like C data types.
This method is a lot more work than gnovice's overrides, but it should integrate better into a large codebase and is safer than altering the definition for built-in types, so I think it should be mentioned for completeness.
Here's a MEX file which performs the C "+" operation on a Matlab array. Make one of these for each operator you want C-style behavior on.
/* c_plus.c - MEX function: C-style (not Matlab-style) "+" operation */
#include "mex.h"
#include "matrix.h"
#include <stdio.h>
void mexFunction(
int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[]
)
{
mxArray *out;
/* In production code, input/output type and bounds checks would go here. */
const mxArray *a = prhs[0];
const mxArray *b = prhs[1];
int i, n;
int *a_int32, *b_int32, *out_int32;
short *a_int16, *b_int16, *out_int16;
mxClassID datatype = mxGetClassID(a);
int n_a = mxGetNumberOfElements(a);
int n_b = mxGetNumberOfElements(b);
int a_is_scalar = n_a == 1;
int b_is_scalar = n_b == 1;
n = n_a >= n_b ? n_a : n_b;
out = mxCreateNumericArray(mxGetNumberOfDimensions(a), mxGetDimensions(a),
datatype, mxIsComplex(a));
switch (datatype) {
case mxINT32_CLASS:
a_int32 = (int*) mxGetData(a);
b_int32 = (int*) mxGetData(b);
out_int32 = (int*) mxGetData(out);
for (i=0; i<n; i++) {
if (a_is_scalar) {
out_int32[i] = a_int32[i] + b_int32[i];
} else if (b_is_scalar) {
out_int32[i] = a_int32[i] + b_int32[0];
} else {
out_int32[i] = a_int32[i] + b_int32[i];
}
}
break;
case mxINT16_CLASS:
a_int16 = (short*) mxGetData(a);
b_int16 = (short*) mxGetData(b);
out_int16 = (short*) mxGetData(out);
for (i=0; i<n; i++) {
if (a_is_scalar) {
out_int16[i] = a_int16[0] + b_int16[i];
} else if (b_is_scalar) {
out_int16[i] = a_int16[i] + b_int16[0];
} else {
out_int16[i] = a_int16[i] + b_int16[i];
}
}
break;
/* Yes, you'd have to add a separate case for every numeric mxClassID... */
/* In C++ you could do it with a template. */
default:
mexErrMsgTxt("Unsupported array type");
break;
}
plhs[0] = out;
}
Then you have to figure out how to invoke it from your Matlab code. If you're writing all the code, you could just call "c_plus(a, b)" instead of "a + b" everywhere. Alternately, you could create your own numeric wrapper class, e.g. #cnumeric, that holds a Matlab numeric array in its field and defines plus() and other operations that invoke the approprate C style MEX function.
classdef cnumeric
properties
x % the underlying Matlab numeric array
end
methods
function obj = cnumeric(x)
obj.x = x;
end
function out = plus(a,b)
[a,b] = promote(a, b); % for convenience, and to mimic Matlab implicit promotion
if ~isequal(class(a.x), class(b.x))
error('inputs must have same wrapped type');
end
out_x = c_plus(a.x, b.x);
out = cnumeric(out_x);
end
% You'd have to define the math operations that you want normal
% Matlab behavior on, too
function out = minus(a,b)
[a,b] = promote(a, b);
out = cnumeric(a.x - b.x);
end
function display(obj)
fprintf('%s = \ncnumeric: %s\n', inputname(1), num2str(obj.x));
end
function [a,b] = promote(a,b)
%PROMOTE Implicit promotion of numeric to cnumeric and doubles to int
if isnumeric(a); a = cnumeric(a); end
if isnumeric(b); b = cnumeric(b); end
if isinteger(a.x) && isa(b.x, 'double')
b.x = cast(b.x, class(a.x));
end
if isinteger(b.x) && isa(a.x, 'double')
a.x = cast(a.x, class(b.x));
end
end
end
end
Then wrap your numbers in the #cnumeric where you want C-style int behavior and do math with them.
>> cnumeric(int32(intmax))
ans =
cnumeric: 2147483647
>> cnumeric(int32(intmax)) - 1
ans =
cnumeric: 2147483646
>> cnumeric(int32(intmax)) + 1
ans =
cnumeric: -2147483648
>> cnumeric(int16(intmax('int16')))
ans =
cnumeric: 32767
>> cnumeric(int16(intmax('int16'))) + 1
ans =
cnumeric: -32768
There's your C-style overflow behavior, isolated from breaking the primitive #int32 type. Plus, you can pass a #cnumeric object in to other functions that are expecting regular numerics and it'll "work" as long as they treat their inputs polymorphically.
Performance caveat: because this is an object, + will have the slower speed of a method dispatch instead of a builtin. If you have few calls on large arrays, this'll be fast, because the actual numeric operations are in C. Lots of calls on small arrays, could slow things down, because you're paying the per method call overhead a lot.
I ran the following snippet of code
test = int32(2^31-12);
for i = 1:24
test = test + int32(1)
end
with unexpected results. It seems that, for Matlab, intmax('int32')+1==intmax('int32'). I'm running 2010a on a 64-bit Mac OS X.
Not sure that this as an answer, more confirmation that Matlab behaves counterintuitively. However, the documentation for the intmax() function states:
Any value larger than the value returned by intmax saturates to the intmax value when cast to a 32-bit integer.
So I guess Matlab is behaving as documented.
Hm, yes...
Actually, I was able to solve the problem with my custom "overflow"-Subroutine... Now it runs painfully slow, but without unexpected behaviour! My mistake was a missing round(), since Matlab/Octave will introduce small errors.
But if someone knows a faster solution, I would be glad to try it!
function ret = overflow_sg(arg,bw)
% remove possible rounding errors, and prepare returnvalue (if number is inside boundaries, nothing will happen)
ret = round(arg);
argsize = size(ret);
for i = 1:argsize(1)
for j = 1:argsize(2)
ret(i,j) = flow_sg(ret(i,j),bw);
end
end
end%function
%---
function ret = flow_sg(arg,bw)
ret = arg;
while (ret < (-2^(bw-1)))
ret = ret + 2^bw;
end
% Check for overflows:
while (ret > (2^(bw-1)-1))
ret = ret - 2^bw;
end
end%function
If 64 bits is enough to not overflow, and you need a lot of these, perhaps do this:
function ret = overflow_sg(arg,bw)
mask = int64(0);
for i=1:round(bw)
mask = bitset(mask,i);
end
topbit = bitshift(int64(1),round(bw-1));
subfrom = double(bitshift(topbit,1))
ret = bitand( int64(arg) , mask );
i = (ret >= topbit);
ret(i) = int64(double(ret(i))-subfrom);
if (bw<=32)
ret = int32(ret);
end
end
Almost everything is done as a matrix calculation, and a lot is done with bits, and everything is done in one step (no while loops), so it should be pretty fast. If you're going to populate it with rand, subtract 0.5 since it assumes it should round to integer values (rather than truncate).
I'm not a Java expert, but underlying Java classes available in Matlab should allow handling of overflows like C would. One solution I found, works only for single value, but it converts a number to the int16 (Short) or int32 (Integer) representation. You must do your math using Matlab double, then convert to Java int16 or int32, then convert back to Matlab double. Unfortunately Java doesn't appear to support unsigned types in this way, only signed.
double(java.lang.Short(hex2dec('7FFF')))
<br>ans = 32767
double(java.lang.Short(hex2dec('7FFF')+1))
<br>ans = -32768
double(java.lang.Short(double(intmax('int16'))+1))
<br>ans = -32768
double(java.lang.Integer(hex2dec('7FFF')+1))
<br>ans = 32768
https://www.tutorialspoint.com/java/lang/java_lang_integer.htm