Vector/Array to integer - objective-c++

-(void)userShow{
xVal = new vector<double>();
yVal = new vector<double>();
xyVal = new vector<double>();
xxVal = new vector<double>();
value = new vector<double>();
for(it = xp->begin(); it != xp->end(); ++it){
xVal->push_back(it->y);
xxVal->push_back(it->x);
}
for(it = yp->begin(); it != yp->end(); ++it){
xyVal->push_back(it->x);
yVal->push_back(it->y);
}
for (int i = 0; i < xVal->size(); i++){
int c = (*xVal)[i];
for(int i = 0; xyVal[i] < xxVal[i]; i++){
double value = yVal[c-1] + (yVal[c] - yVal[c-1])*(xxVal[i] - xyVal[c-1])/(xyVal[c] - xyVal[c-1]);
yVal->push_back(value);
}
}
}
I am having an issue with the double value = ... part of my code. I get three errors saying invalid operands to binary expression ('vector<double>' and 'vector<double>') pointing to the c.
should int c = (*xVal)[i]; be double c = (*xVal)[i]; when i try to use double i get 6 errors saying Array subscript is not an integer. Which means I need to convert the array into an integer. How am I getting an array if I am using vectors? Just a lot of confusion at the moment.
Not really sure if i really need to explain what the code is supposed to do, but if it helps. I am trying to get it so it take two vectors splits the vectors x and y's into x and y. then take the y of xp and the y of yp and put them together. but because xp and yp vectors do not match i need to use the for loop and the double value algorithm to get a decent set of numbers.

The c is fine. The problem really is in double value = .., as your compiler says. You have pointers, so you can't access the array's elements like this:
double value = yVal[c-1] + ...
It must be
double value = (*yVal)[c-1] +
The same for xyVal, xxVal, etc. You need to fix the whole inner for loop.
But why you allocate the vectors like this...? Is there any reason to use new? This is so error prone. I'd use just
vector<double> xVar;
instead of
xVal = new vector<double>();
And then use . instead of -> combined with *. It so much easier.
Ah, forgot about the question for c - no, it should not be double. You can't use floating point numbers for indices. Also, if xVal is supposed to contain integer numbers (so that they can be used for indices), why don't you just declare the vector as vector< int > instead of vector< double >? I don't what's the logic in your program, but it looks like it(the logic) should be improved, IMO.

Related

Assigning an array to a vector 32 bits at the time

I have an array 32 bit wide of n elements and I am trying to assign these elements to a vector, I have the following code:
function automatic logic [SIZE-1:0] my_function (my_array x_map);
logic SIZE-1:0] y_map = '0;
int fctr = (SIZE)/32;
int top_bnd = 31;
int lwr_bnd = 0;
for(int k0 = 0; k0 < fctr; k0++)
begin
y_map[top_bnd:lwr_bnd] = x_map[k0];
top_bnd = (top_bnd + 32'hFFFF);
lwr_bnd = (lwr_bnd + 32'hFFFF);
end
return y_map;
endfunction
However this is not working and I get two errors:
1) "the range of the part select is illegal"
2) "Cannot evaluate the expression in left slicing expression, the expression must be compile time constant"
Thanks
You might want to use the streaming operators for this
y_map = {<<32{x_map}};
BTW, you should show the declarations of all identifiers in your example, i.e. my_array.

convert number string into float with specific precision (without getting rounding errors)

I have a vector of cells (say, size of 50x1, called tokens) , each of which is a struct with properties x,f1,f2 which are strings representing numbers. for example, tokens{15} gives:
x: "-1.4343429"
f1: "15.7947111"
f2: "-5.8196158"
and I am trying to put those numbers into 3 vectors (each is also 50x1) whose type is float. So I create 3 vectors:
x = zeros(50,1,'single');
f1 = zeros(50,1,'single');
f2 = zeros(50,1,'single');
and that works fine (why wouldn't it?). But then when I try to populate those vectors: (L is a for loop index)
x(L)=tokens{L}.x;
.. also for the other 2
I get :
The following error occurred converting from string to single:
Conversion to single from string is not possible.
Which I can understand; implicit conversion doesn't work for single. It does work if x, f1 and f2 are of type 50x1 double.
The reason I am doing it with floats is because the data I get is from a C program which writes the some floats into a file to be read by matlab. If I try to convert the values into doubles in the C program I get rounding errors...
So, (after what I hope is a good question,) how might I be able to get the numbers in those strings, at the right precision? (all the strings have the same number of decimal places: 7).
The MCVE:
filedata = fopen('fname1.txt','rt');
%fname1.txt is created by a C program. I am quite sure that the problem isn't there.
scanned = textscan(filedata,'%s','Delimiter','\n');
raw = scanned{1};
stringValues = strings(50,1);
for K=1:length(raw)
stringValues(K)=raw{K};
end
clear K %purely for convenience
regex = 'x=(?<x>[\-\.0-9]*),f1=(?<f1>[\-\.0-9]*),f2=(?<f2>[\-\.0-9]*)';
tokens = regexp(stringValues,regex,'names');
x = zeros(50,1,'single');
f1 = zeros(50,1,'single');
f2 = zeros(50,1,'single');
for L=1:length(tokens)
x(L)=tokens{L}.x;
f1(L)=tokens{L}.f1;
f2(L)=tokens{L}.f2;
end
Use function str2double before assigning into yours arrays (and then cast it to single if you want). Strings (char arrays) must be explicitely converted to numbers before using them as numbers.

Is there a way to make Matlab ints behave more like C++ ints? [duplicate]

I'm working on a verification-tool for some VHDL-Code in MATLAB/Octave. Therefore I need data types which generate "real" overflows:
intmax('int32') + 1
ans = -2147483648
Later on, it would be helpful if I can define the bit width of a variable, but that is not so important right now.
When I build a C-like example, where a variable gets increased until it's smaller than zero, it spins forever and ever:
test = int32(2^30);
while (test > 0)
test = test + int32(1);
end
Another approach I tried was a custom "overflow"-routine which was called every time after a number is changed. This approach was painfully slow, not practicable and not working in all cases at all. Any suggestions?
In MATLAB, one option you have is to overload the methods that handle arithmetic operations for integer data types, creating your own custom overflow behavior that will result in a "wrap-around" of the integer value. As stated in the documentation:
You can define or overload your own
methods for int* (as you can for any
object) by placing the appropriately
named method in an #int* folder within
a folder on your path. Type help
datatypes for the names of the methods
you can overload.
This page of the documentation lists the equivalent methods for the arithmetic operators. The binary addition operation A+B is actually handled by the function plus(A,B). Therefore, you can create a folder called #int32 (placed in another folder on your MATLAB path) and put a function plus.m in there that will be used instead of the built-in method for int32 data types.
Here's an example of how you could design your overloaded plus function in order to create the overflow/underflow behavior you want:
function C = plus(A,B)
%# NOTE: This code sample is designed to work for scalar values of
%# the inputs. If one or more of the inputs is non-scalar,
%# the code below will need to be vectorized to accommodate,
%# and error checking of the input sizes will be needed.
if (A > 0) && (B > (intmax-A)) %# An overflow condition
C = builtin('plus',intmin,...
B-(intmax-A)-1); %# Wraps around to negative
elseif (A < 0) && (B < (intmin-A)) %# An underflow condition
C = builtin('plus',intmax,...
B-(intmin-A-1)); %# Wraps around to positive
else
C = builtin('plus',A,B); %# No problems; call the built-in plus.m
end
end
Notice that I call the built-in plus method (using the BUILTIN function) to perform addition of int32 values that I know will not suffer overflow/underflow problems. If I were to instead perform the integer addition using the operation A+B it would result in a recursive call to my overloaded plus method, which could lead to additional computational overhead or (in the worst-case scenario where the last line was C = A+B;) infinite recursion.
Here's a test, showing the wrap-around overflow behavior in action:
>> A = int32(2147483642); %# A value close to INTMAX
>> for i = 1:10, A = A+1; disp(A); end
2147483643
2147483644
2147483645
2147483646
2147483647 %# INTMAX
-2147483648 %# INTMIN
-2147483647
-2147483646
-2147483645
-2147483644
If you want to get C style numeric operations, you can use a MEX function to call the C operators directly, and by definition they'll work like C data types.
This method is a lot more work than gnovice's overrides, but it should integrate better into a large codebase and is safer than altering the definition for built-in types, so I think it should be mentioned for completeness.
Here's a MEX file which performs the C "+" operation on a Matlab array. Make one of these for each operator you want C-style behavior on.
/* c_plus.c - MEX function: C-style (not Matlab-style) "+" operation */
#include "mex.h"
#include "matrix.h"
#include <stdio.h>
void mexFunction(
int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[]
)
{
mxArray *out;
/* In production code, input/output type and bounds checks would go here. */
const mxArray *a = prhs[0];
const mxArray *b = prhs[1];
int i, n;
int *a_int32, *b_int32, *out_int32;
short *a_int16, *b_int16, *out_int16;
mxClassID datatype = mxGetClassID(a);
int n_a = mxGetNumberOfElements(a);
int n_b = mxGetNumberOfElements(b);
int a_is_scalar = n_a == 1;
int b_is_scalar = n_b == 1;
n = n_a >= n_b ? n_a : n_b;
out = mxCreateNumericArray(mxGetNumberOfDimensions(a), mxGetDimensions(a),
datatype, mxIsComplex(a));
switch (datatype) {
case mxINT32_CLASS:
a_int32 = (int*) mxGetData(a);
b_int32 = (int*) mxGetData(b);
out_int32 = (int*) mxGetData(out);
for (i=0; i<n; i++) {
if (a_is_scalar) {
out_int32[i] = a_int32[i] + b_int32[i];
} else if (b_is_scalar) {
out_int32[i] = a_int32[i] + b_int32[0];
} else {
out_int32[i] = a_int32[i] + b_int32[i];
}
}
break;
case mxINT16_CLASS:
a_int16 = (short*) mxGetData(a);
b_int16 = (short*) mxGetData(b);
out_int16 = (short*) mxGetData(out);
for (i=0; i<n; i++) {
if (a_is_scalar) {
out_int16[i] = a_int16[0] + b_int16[i];
} else if (b_is_scalar) {
out_int16[i] = a_int16[i] + b_int16[0];
} else {
out_int16[i] = a_int16[i] + b_int16[i];
}
}
break;
/* Yes, you'd have to add a separate case for every numeric mxClassID... */
/* In C++ you could do it with a template. */
default:
mexErrMsgTxt("Unsupported array type");
break;
}
plhs[0] = out;
}
Then you have to figure out how to invoke it from your Matlab code. If you're writing all the code, you could just call "c_plus(a, b)" instead of "a + b" everywhere. Alternately, you could create your own numeric wrapper class, e.g. #cnumeric, that holds a Matlab numeric array in its field and defines plus() and other operations that invoke the approprate C style MEX function.
classdef cnumeric
properties
x % the underlying Matlab numeric array
end
methods
function obj = cnumeric(x)
obj.x = x;
end
function out = plus(a,b)
[a,b] = promote(a, b); % for convenience, and to mimic Matlab implicit promotion
if ~isequal(class(a.x), class(b.x))
error('inputs must have same wrapped type');
end
out_x = c_plus(a.x, b.x);
out = cnumeric(out_x);
end
% You'd have to define the math operations that you want normal
% Matlab behavior on, too
function out = minus(a,b)
[a,b] = promote(a, b);
out = cnumeric(a.x - b.x);
end
function display(obj)
fprintf('%s = \ncnumeric: %s\n', inputname(1), num2str(obj.x));
end
function [a,b] = promote(a,b)
%PROMOTE Implicit promotion of numeric to cnumeric and doubles to int
if isnumeric(a); a = cnumeric(a); end
if isnumeric(b); b = cnumeric(b); end
if isinteger(a.x) && isa(b.x, 'double')
b.x = cast(b.x, class(a.x));
end
if isinteger(b.x) && isa(a.x, 'double')
a.x = cast(a.x, class(b.x));
end
end
end
end
Then wrap your numbers in the #cnumeric where you want C-style int behavior and do math with them.
>> cnumeric(int32(intmax))
ans =
cnumeric: 2147483647
>> cnumeric(int32(intmax)) - 1
ans =
cnumeric: 2147483646
>> cnumeric(int32(intmax)) + 1
ans =
cnumeric: -2147483648
>> cnumeric(int16(intmax('int16')))
ans =
cnumeric: 32767
>> cnumeric(int16(intmax('int16'))) + 1
ans =
cnumeric: -32768
There's your C-style overflow behavior, isolated from breaking the primitive #int32 type. Plus, you can pass a #cnumeric object in to other functions that are expecting regular numerics and it'll "work" as long as they treat their inputs polymorphically.
Performance caveat: because this is an object, + will have the slower speed of a method dispatch instead of a builtin. If you have few calls on large arrays, this'll be fast, because the actual numeric operations are in C. Lots of calls on small arrays, could slow things down, because you're paying the per method call overhead a lot.
I ran the following snippet of code
test = int32(2^31-12);
for i = 1:24
test = test + int32(1)
end
with unexpected results. It seems that, for Matlab, intmax('int32')+1==intmax('int32'). I'm running 2010a on a 64-bit Mac OS X.
Not sure that this as an answer, more confirmation that Matlab behaves counterintuitively. However, the documentation for the intmax() function states:
Any value larger than the value returned by intmax saturates to the intmax value when cast to a 32-bit integer.
So I guess Matlab is behaving as documented.
Hm, yes...
Actually, I was able to solve the problem with my custom "overflow"-Subroutine... Now it runs painfully slow, but without unexpected behaviour! My mistake was a missing round(), since Matlab/Octave will introduce small errors.
But if someone knows a faster solution, I would be glad to try it!
function ret = overflow_sg(arg,bw)
% remove possible rounding errors, and prepare returnvalue (if number is inside boundaries, nothing will happen)
ret = round(arg);
argsize = size(ret);
for i = 1:argsize(1)
for j = 1:argsize(2)
ret(i,j) = flow_sg(ret(i,j),bw);
end
end
end%function
%---
function ret = flow_sg(arg,bw)
ret = arg;
while (ret < (-2^(bw-1)))
ret = ret + 2^bw;
end
% Check for overflows:
while (ret > (2^(bw-1)-1))
ret = ret - 2^bw;
end
end%function
If 64 bits is enough to not overflow, and you need a lot of these, perhaps do this:
function ret = overflow_sg(arg,bw)
mask = int64(0);
for i=1:round(bw)
mask = bitset(mask,i);
end
topbit = bitshift(int64(1),round(bw-1));
subfrom = double(bitshift(topbit,1))
ret = bitand( int64(arg) , mask );
i = (ret >= topbit);
ret(i) = int64(double(ret(i))-subfrom);
if (bw<=32)
ret = int32(ret);
end
end
Almost everything is done as a matrix calculation, and a lot is done with bits, and everything is done in one step (no while loops), so it should be pretty fast. If you're going to populate it with rand, subtract 0.5 since it assumes it should round to integer values (rather than truncate).
I'm not a Java expert, but underlying Java classes available in Matlab should allow handling of overflows like C would. One solution I found, works only for single value, but it converts a number to the int16 (Short) or int32 (Integer) representation. You must do your math using Matlab double, then convert to Java int16 or int32, then convert back to Matlab double. Unfortunately Java doesn't appear to support unsigned types in this way, only signed.
double(java.lang.Short(hex2dec('7FFF')))
<br>ans = 32767
double(java.lang.Short(hex2dec('7FFF')+1))
<br>ans = -32768
double(java.lang.Short(double(intmax('int16'))+1))
<br>ans = -32768
double(java.lang.Integer(hex2dec('7FFF')+1))
<br>ans = 32768
https://www.tutorialspoint.com/java/lang/java_lang_integer.htm

What are # and : used for in Qbasic?

I have a legacy code doing math calculations. It is reportedly written in QBasic, and runs under VB6 successfully. I plan to write the code into a newer language/platform. For which I must first work backwards and come up with a detailed algorithm from existing code.
The problem is I can't understand syntax of few lines:
Dim a(1 to 200) as Double
Dim b as Double
Dim f(1 to 200) as Double
Dim g(1 to 200) as Double
For i = 1 to N
a(i) = b: a(i+N) = c
f(i) = 1#: g(i) = 0#
f(i+N) = 0#: g(i+N) = 1#
Next i
Based on my work with VB5 like 9 years ago, I am guessing that a, f and g are Double arrays indexed from 1 to 200. However, I am completely lost about this use of # and : together inside the body of the for-loop.
: is the line continuation character, it allows you to chain multiple statements on the same line. a(i) = b: a(i+N) = c is equivalent to:
a(i)=b
a(i+N)=c
# is a type specifier. It specifies that the number it follows should be treated as a double.
I haven't programmed in QBasic for a while but I did extensively in highschool. The # symbol indicates a particular data type. It is to designate the RHS value as a floating point number with double precision (similar to saying 1.0f in C to make 1.0 a single-precision float). The colon symbol is similar to the semicolon in C, as well, where it delimits different commands. For instance:
a(i) = b: a(i+N) = c
is, in C:
a[i] = b; a[i+N] = c;

How do I get real integer overflows in MATLAB/Octave?

I'm working on a verification-tool for some VHDL-Code in MATLAB/Octave. Therefore I need data types which generate "real" overflows:
intmax('int32') + 1
ans = -2147483648
Later on, it would be helpful if I can define the bit width of a variable, but that is not so important right now.
When I build a C-like example, where a variable gets increased until it's smaller than zero, it spins forever and ever:
test = int32(2^30);
while (test > 0)
test = test + int32(1);
end
Another approach I tried was a custom "overflow"-routine which was called every time after a number is changed. This approach was painfully slow, not practicable and not working in all cases at all. Any suggestions?
In MATLAB, one option you have is to overload the methods that handle arithmetic operations for integer data types, creating your own custom overflow behavior that will result in a "wrap-around" of the integer value. As stated in the documentation:
You can define or overload your own
methods for int* (as you can for any
object) by placing the appropriately
named method in an #int* folder within
a folder on your path. Type help
datatypes for the names of the methods
you can overload.
This page of the documentation lists the equivalent methods for the arithmetic operators. The binary addition operation A+B is actually handled by the function plus(A,B). Therefore, you can create a folder called #int32 (placed in another folder on your MATLAB path) and put a function plus.m in there that will be used instead of the built-in method for int32 data types.
Here's an example of how you could design your overloaded plus function in order to create the overflow/underflow behavior you want:
function C = plus(A,B)
%# NOTE: This code sample is designed to work for scalar values of
%# the inputs. If one or more of the inputs is non-scalar,
%# the code below will need to be vectorized to accommodate,
%# and error checking of the input sizes will be needed.
if (A > 0) && (B > (intmax-A)) %# An overflow condition
C = builtin('plus',intmin,...
B-(intmax-A)-1); %# Wraps around to negative
elseif (A < 0) && (B < (intmin-A)) %# An underflow condition
C = builtin('plus',intmax,...
B-(intmin-A-1)); %# Wraps around to positive
else
C = builtin('plus',A,B); %# No problems; call the built-in plus.m
end
end
Notice that I call the built-in plus method (using the BUILTIN function) to perform addition of int32 values that I know will not suffer overflow/underflow problems. If I were to instead perform the integer addition using the operation A+B it would result in a recursive call to my overloaded plus method, which could lead to additional computational overhead or (in the worst-case scenario where the last line was C = A+B;) infinite recursion.
Here's a test, showing the wrap-around overflow behavior in action:
>> A = int32(2147483642); %# A value close to INTMAX
>> for i = 1:10, A = A+1; disp(A); end
2147483643
2147483644
2147483645
2147483646
2147483647 %# INTMAX
-2147483648 %# INTMIN
-2147483647
-2147483646
-2147483645
-2147483644
If you want to get C style numeric operations, you can use a MEX function to call the C operators directly, and by definition they'll work like C data types.
This method is a lot more work than gnovice's overrides, but it should integrate better into a large codebase and is safer than altering the definition for built-in types, so I think it should be mentioned for completeness.
Here's a MEX file which performs the C "+" operation on a Matlab array. Make one of these for each operator you want C-style behavior on.
/* c_plus.c - MEX function: C-style (not Matlab-style) "+" operation */
#include "mex.h"
#include "matrix.h"
#include <stdio.h>
void mexFunction(
int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[]
)
{
mxArray *out;
/* In production code, input/output type and bounds checks would go here. */
const mxArray *a = prhs[0];
const mxArray *b = prhs[1];
int i, n;
int *a_int32, *b_int32, *out_int32;
short *a_int16, *b_int16, *out_int16;
mxClassID datatype = mxGetClassID(a);
int n_a = mxGetNumberOfElements(a);
int n_b = mxGetNumberOfElements(b);
int a_is_scalar = n_a == 1;
int b_is_scalar = n_b == 1;
n = n_a >= n_b ? n_a : n_b;
out = mxCreateNumericArray(mxGetNumberOfDimensions(a), mxGetDimensions(a),
datatype, mxIsComplex(a));
switch (datatype) {
case mxINT32_CLASS:
a_int32 = (int*) mxGetData(a);
b_int32 = (int*) mxGetData(b);
out_int32 = (int*) mxGetData(out);
for (i=0; i<n; i++) {
if (a_is_scalar) {
out_int32[i] = a_int32[i] + b_int32[i];
} else if (b_is_scalar) {
out_int32[i] = a_int32[i] + b_int32[0];
} else {
out_int32[i] = a_int32[i] + b_int32[i];
}
}
break;
case mxINT16_CLASS:
a_int16 = (short*) mxGetData(a);
b_int16 = (short*) mxGetData(b);
out_int16 = (short*) mxGetData(out);
for (i=0; i<n; i++) {
if (a_is_scalar) {
out_int16[i] = a_int16[0] + b_int16[i];
} else if (b_is_scalar) {
out_int16[i] = a_int16[i] + b_int16[0];
} else {
out_int16[i] = a_int16[i] + b_int16[i];
}
}
break;
/* Yes, you'd have to add a separate case for every numeric mxClassID... */
/* In C++ you could do it with a template. */
default:
mexErrMsgTxt("Unsupported array type");
break;
}
plhs[0] = out;
}
Then you have to figure out how to invoke it from your Matlab code. If you're writing all the code, you could just call "c_plus(a, b)" instead of "a + b" everywhere. Alternately, you could create your own numeric wrapper class, e.g. #cnumeric, that holds a Matlab numeric array in its field and defines plus() and other operations that invoke the approprate C style MEX function.
classdef cnumeric
properties
x % the underlying Matlab numeric array
end
methods
function obj = cnumeric(x)
obj.x = x;
end
function out = plus(a,b)
[a,b] = promote(a, b); % for convenience, and to mimic Matlab implicit promotion
if ~isequal(class(a.x), class(b.x))
error('inputs must have same wrapped type');
end
out_x = c_plus(a.x, b.x);
out = cnumeric(out_x);
end
% You'd have to define the math operations that you want normal
% Matlab behavior on, too
function out = minus(a,b)
[a,b] = promote(a, b);
out = cnumeric(a.x - b.x);
end
function display(obj)
fprintf('%s = \ncnumeric: %s\n', inputname(1), num2str(obj.x));
end
function [a,b] = promote(a,b)
%PROMOTE Implicit promotion of numeric to cnumeric and doubles to int
if isnumeric(a); a = cnumeric(a); end
if isnumeric(b); b = cnumeric(b); end
if isinteger(a.x) && isa(b.x, 'double')
b.x = cast(b.x, class(a.x));
end
if isinteger(b.x) && isa(a.x, 'double')
a.x = cast(a.x, class(b.x));
end
end
end
end
Then wrap your numbers in the #cnumeric where you want C-style int behavior and do math with them.
>> cnumeric(int32(intmax))
ans =
cnumeric: 2147483647
>> cnumeric(int32(intmax)) - 1
ans =
cnumeric: 2147483646
>> cnumeric(int32(intmax)) + 1
ans =
cnumeric: -2147483648
>> cnumeric(int16(intmax('int16')))
ans =
cnumeric: 32767
>> cnumeric(int16(intmax('int16'))) + 1
ans =
cnumeric: -32768
There's your C-style overflow behavior, isolated from breaking the primitive #int32 type. Plus, you can pass a #cnumeric object in to other functions that are expecting regular numerics and it'll "work" as long as they treat their inputs polymorphically.
Performance caveat: because this is an object, + will have the slower speed of a method dispatch instead of a builtin. If you have few calls on large arrays, this'll be fast, because the actual numeric operations are in C. Lots of calls on small arrays, could slow things down, because you're paying the per method call overhead a lot.
I ran the following snippet of code
test = int32(2^31-12);
for i = 1:24
test = test + int32(1)
end
with unexpected results. It seems that, for Matlab, intmax('int32')+1==intmax('int32'). I'm running 2010a on a 64-bit Mac OS X.
Not sure that this as an answer, more confirmation that Matlab behaves counterintuitively. However, the documentation for the intmax() function states:
Any value larger than the value returned by intmax saturates to the intmax value when cast to a 32-bit integer.
So I guess Matlab is behaving as documented.
Hm, yes...
Actually, I was able to solve the problem with my custom "overflow"-Subroutine... Now it runs painfully slow, but without unexpected behaviour! My mistake was a missing round(), since Matlab/Octave will introduce small errors.
But if someone knows a faster solution, I would be glad to try it!
function ret = overflow_sg(arg,bw)
% remove possible rounding errors, and prepare returnvalue (if number is inside boundaries, nothing will happen)
ret = round(arg);
argsize = size(ret);
for i = 1:argsize(1)
for j = 1:argsize(2)
ret(i,j) = flow_sg(ret(i,j),bw);
end
end
end%function
%---
function ret = flow_sg(arg,bw)
ret = arg;
while (ret < (-2^(bw-1)))
ret = ret + 2^bw;
end
% Check for overflows:
while (ret > (2^(bw-1)-1))
ret = ret - 2^bw;
end
end%function
If 64 bits is enough to not overflow, and you need a lot of these, perhaps do this:
function ret = overflow_sg(arg,bw)
mask = int64(0);
for i=1:round(bw)
mask = bitset(mask,i);
end
topbit = bitshift(int64(1),round(bw-1));
subfrom = double(bitshift(topbit,1))
ret = bitand( int64(arg) , mask );
i = (ret >= topbit);
ret(i) = int64(double(ret(i))-subfrom);
if (bw<=32)
ret = int32(ret);
end
end
Almost everything is done as a matrix calculation, and a lot is done with bits, and everything is done in one step (no while loops), so it should be pretty fast. If you're going to populate it with rand, subtract 0.5 since it assumes it should round to integer values (rather than truncate).
I'm not a Java expert, but underlying Java classes available in Matlab should allow handling of overflows like C would. One solution I found, works only for single value, but it converts a number to the int16 (Short) or int32 (Integer) representation. You must do your math using Matlab double, then convert to Java int16 or int32, then convert back to Matlab double. Unfortunately Java doesn't appear to support unsigned types in this way, only signed.
double(java.lang.Short(hex2dec('7FFF')))
<br>ans = 32767
double(java.lang.Short(hex2dec('7FFF')+1))
<br>ans = -32768
double(java.lang.Short(double(intmax('int16'))+1))
<br>ans = -32768
double(java.lang.Integer(hex2dec('7FFF')+1))
<br>ans = 32768
https://www.tutorialspoint.com/java/lang/java_lang_integer.htm