realmax on my machine is:
1.7977e+308
I know I have to write my code in a way to avoid long integer calculations, but is there any way to increase the limit?
I mean something like gmp library in C
You may find vpa (variable- precision arithmetic) helpful:
R = vpa(A) uses variable-precision arithmetic (VPA) to compute each element of A to at least d decimal digits of accuracy, where d is the current setting of digits.
R = vpa(A,d) uses at least d significant (nonzero) digits, instead of the current setting of digits.
Here's an example how to use it:
>> x = vpa('10^500/20')
ans =
5.0e498
Note that:
The output x is of symbolic (sym) type. Of course, you shouldn't convert it to double, because it would exceed realmax:
>> double(x)
ans =
Inf
Use string input in order to avoid evaluating large input values as double. For example, this doesn't work
>> vpa(10^500/20)
ans =
Inf
because 10^500 is evaluated as double, giving inf, and then is used as an input to vpa.
Related
For example, I have the symbolic value which is 's+5/2'. Is there a way to display it like 's+2.5'?
Yes, use vpa. Simply take your symbolic expression and use the vpa function to facilitate the conversion. vpa evaluates each term in the symbolic expression and converts each value to using up to 32 significant digits whenever possible. You can also override the amount of significant digits with the second parameter to vpa, but that's not needed in your case.
Here's a quick example:
>> syms s
>> A = s + 5/2
A =
s + 5/2
>> vpa(A)
ans =
s + 2.5
You can also set the initial approximation mode of the numeric literal to decimal mode, where the precision is dictated by digits, using the 'd' flag of the sym function:
>> expr = sym('s') + sym(5/2,'d')
expr =
s + 2.5
I want to convert 64 bit numbers from binary to decimal. Since dec2bin only supports up to 52 bits, I thought I could roll my own function and use uint64 to go beyond this limit:
function [dec] = my_bin2dec(bin)
v = uint64(length(bin)-1:-1:0);
base = uint64(2).^v;
dec = uint64(sum(uint64(base.*(uint64(bin-'0')))));
end
However, it does not work as expected:
my_bin2dec('111000000000000000000000000000000000001010110101011101000001110')
ans =
8070450532270651392
my_bin2dec('111000000000000000000000000000000000001010110101011101000001111')
ans =
8070450532270651392
Whereas this is the correct result:
(111000000000000000000000000000000000001010110101011101000001110)bin
= (8070450532270651918)dec
(111000000000000000000000000000000000001010110101011101000001111)bin
= (8070450532270651919)dec
What am I missing? It seems like there is some operation still performed using 52bit double arithmetic, but I don't know which one.
I checked if the operations are available for uint64 and it seems that the ones I use (power, times, sum) are there:
>> methods uint64
Methods for class uint64:
abs bitxor diff isinf mod plus sum
accumarray bsxfun display isnan mpower power times
all ceil eq issorted mrdivide prod transpose
and colon find ldivide mtimes rdivide tril
any conj fix le ne real triu
bitand ctranspose floor linsolve nnz rem uminus
bitcmp cummax full lt nonzeros reshape uplus
bitget cummin ge max not round xor
bitor cumprod gt min nzmax sign
bitset cumsum imag minus or sort
bitshift diag isfinite mldivide permute sortrowsc
You were right in saying that
It seems like there is some operation still performed using 52bit double arithmetic.
The problem is in line
dec = uint64(sum(uint64(base.*(uint64(bin-'0')))));
The operation sum(uint64(base.*(uint64(bin-'0')))) gives a double result, which only has about 15 significant digits. That's why your lowest digits are wrong. Subsequent conversion into uint64 doesn't help, because precision has already been lost.
The solution is to sum natively in uint64. This gives a uint64 result with its full precision:
dec = sum(uint64(base.*(uint64(bin-'0'))), 'native');
Had the same thought as #beaker, break it into chunks:
%% dec2bin
x=intmax('uint64')
MSBs = dec2bin( bitshift(x,-32) ,32)
LSBs = dec2bin( bitand(x, hex2dec('FFFFFFFF')) ,32)
y = [MSBs LSBs]
%% bin2dec
MSBs = y(1:32)
LSBs = y(33:64)
z = bitor( bitshift( uint64(bin2dec(MSBs)) , 32 ) , uint64(bin2dec(LSBs)) )
% (now x = z)
Oddly enough, it seems that dec2bin doesn't give an error, but does give incorrect answers for 64 bit numbers:
dec2bin( intmax('uint64') )
ans =
10000000000000000000000000000000000000000000000000000000000000000
I have a strange problem with hex2dec function in Matlab.
I realized in 16bytes data, it omits 2 LSB bytes.
hex2dec('123123123123123A');
dec2hex(ans)
Warning: At least one of the input numbers is larger than the largest integer-valued floating-point
number (2^52). Results may be unpredictable.
ans =
1231231231231200
I am using this in Simulink. Therefore I cannot process 16byte data. Simulink interpret this as a 14byte + '00'.
You need to use uint64 to store that value:
A='123123123123123A';
B=bitshift(uint64(hex2dec(A(1:end-8))),32)+uint64(hex2dec(A(end-7:end)))
which returns
B =
1310867527582290490
An alternative way in MATLAB using typecast:
>> A = '123123123123123A';
>> B = typecast(uint32(hex2dec([A(9:end);A(1:8)])), 'uint64')
B =
1310867527582290490
And the reverse in the opposite direction:
>> AA = dec2hex(typecast(B,'uint32'));
>> AA = [AA(2,:) AA(1,:)]
AA =
123123123123123A
The idea is to treat the 64-integer as two 32-bit integers.
That said, Simulink does not support int64 and uint64 types as others have already noted..
newT = [b(i) d(i) a(i) z(i)];
newT, b(i), a(i)
Prints
newT =
123 364 123 902
ans =
1.234e+02
ans =
1.234e+02
What is the problem here? Why are the first and third entry in newT rounded to integer values? Why aren't they correctly assigned?
Unlike most other programming languages, integer types in Matlab take precedence over floating point types. When you combine them, either through concatenation or arithmetic, the floating point values are implicitly narrowed to integers, instead of the integers being widened to floating point.
>> int32(3) + 0.4
ans =
3
>> [int32(3) 0.4]
ans =
3 0
This is for historical reasons, because (IIRC) Matlab originally didn't have support for integers at all, so all numeric constants in Matlab produce double values, and the promotion rules were created to make it possible to mix integer types with floating-point constants.
To fix this, explicitly convert those int types to doubles before concatenating.
newT = [b(i) double(d(i)) a(i) double(z(i))];
I'm working in Matlab using Non-negative Matrix factorization to decompose a matrix into two factors. Using this I get from A two double precision floating point matrices, B and C.
sample results are
B(1,1) = 0.118
C(1,1) = 112.035
I am now trying to modify specific bits within these values but using the bitset function on either values I get an error because bitset requires unsigned integers.
I have also tried using dec2bin function, which I assumed would convert decimals to binary but it returns '0' for B(1,1).
Does anyone know of any way to deal with floats at bit level without losing precision?
You should look into the typecast and bitset functions. (Doc here and here respectively). That lets you do stuff like
xb = typecast( 1.0, 'uint64' );
xb = bitset( xb, 10, 1 );
typecast( xb, 'double' );
The num2hex and hex2num functions are your friends. (Though not necessarily very good friends; hexadecimal strings aren't the best imaginable form for working on binary floating-point numbers. You could split them into, say, 8-nybble chunks and convert each to an integer.)
From the MATLAB docs:
num2hex([1 0 0.1 -pi Inf NaN])
returns
ans =
3ff0000000000000
0000000000000000
3fb999999999999a
c00921fb54442d18
7ff0000000000000
fff8000000000000
and
num2hex(single([1 0 0.1 -pi Inf NaN]))
returns
ans =
3f800000
00000000
3dcccccd
c0490fdb
7f800000
ffc00000