Performing Bit modification on Floating point numbers in Matlab - matlab

I'm working in Matlab using Non-negative Matrix factorization to decompose a matrix into two factors. Using this I get from A two double precision floating point matrices, B and C.
sample results are
B(1,1) = 0.118
C(1,1) = 112.035
I am now trying to modify specific bits within these values but using the bitset function on either values I get an error because bitset requires unsigned integers.
I have also tried using dec2bin function, which I assumed would convert decimals to binary but it returns '0' for B(1,1).
Does anyone know of any way to deal with floats at bit level without losing precision?

You should look into the typecast and bitset functions. (Doc here and here respectively). That lets you do stuff like
xb = typecast( 1.0, 'uint64' );
xb = bitset( xb, 10, 1 );
typecast( xb, 'double' );

The num2hex and hex2num functions are your friends. (Though not necessarily very good friends; hexadecimal strings aren't the best imaginable form for working on binary floating-point numbers. You could split them into, say, 8-nybble chunks and convert each to an integer.)
From the MATLAB docs:
num2hex([1 0 0.1 -pi Inf NaN])
returns
ans =
3ff0000000000000
0000000000000000
3fb999999999999a
c00921fb54442d18
7ff0000000000000
fff8000000000000
and
num2hex(single([1 0 0.1 -pi Inf NaN]))
returns
ans =
3f800000
00000000
3dcccccd
c0490fdb
7f800000
ffc00000

Related

Octave show fraction when divide without calculating the value

The calculation of 1/23 * [1 2 3] gives me [0.041667 0.083333 0.125000].
I just want a result like [1/23 2/23 3/23]
You can specify format rat, to always display outputs as rational approximations (applies to both matlab and octave).
format rat
a = 1/23 * [1,2,3]
% a = 1/23 2/23 3/23
Or, you can use that "rat" or "rats" functions, to print (as strings) the rational approximations of a float array:
a = 1/23 * [1,2,3]
% a = 0.043478 0.086957 0.130435
rats(a)
% ans = 1/23 2/23 3/23
As Cris pointed out in the comments, this is simply a representational issue. The underlying floating-point representation of the result does not change. If you wish to work with fractions in a 'mathematical' sense, then you need to go about this a different way (possibly symbolic package, or dealing with numerators and denominators manually).

uencode -signal processing matlab

i need to quantize and encode an input signal using matlab so i will use uencode function . The problem is that i am confused about its process , the description says that it quantize and encode the input as integer and then he has displayed an example :
u = -1:0.01:1;
y = uencode(u,3);
plot(u,y,'.')
The output is just integers , can somebody just explain what this integers exactly are ?? and if i need the binary codes of the input u what i must do to get them ?
uencode takes the range of floating point numbers between -1.0 and 1.0, and maps it to the integers from 0 to (2^n)-1.
For example, with n=8, the possible integers are 0 to 255. -1.0 gets mapped to 0, +1.0 gets mapped to 255, and all decimal values in between get mapped to the closest integer.
In the code example you gave, n=3, so it is mapping to the integers 0 to 7. The plot shows horizontal lines because with so few integers available to map to, many floating point values map to the same integer.
To convert a base 10 integer to a base 2 binary string, use the function dec2bin.
>> dec2bin(5)
ans =
101
>> dec2bin(17)
ans =
10001
If you wanting leading zeros, say so that they are always 8 bits long, use the minimum length as a second argument:
>> dec2bin(5, 8)
ans =
00000101

Convert 64 bit numbers from binary to decimal using uint64

I want to convert 64 bit numbers from binary to decimal. Since dec2bin only supports up to 52 bits, I thought I could roll my own function and use uint64 to go beyond this limit:
function [dec] = my_bin2dec(bin)
v = uint64(length(bin)-1:-1:0);
base = uint64(2).^v;
dec = uint64(sum(uint64(base.*(uint64(bin-'0')))));
end
However, it does not work as expected:
my_bin2dec('111000000000000000000000000000000000001010110101011101000001110')
ans =
8070450532270651392
my_bin2dec('111000000000000000000000000000000000001010110101011101000001111')
ans =
8070450532270651392
Whereas this is the correct result:
(111000000000000000000000000000000000001010110101011101000001110)bin
= (8070450532270651918)dec
(111000000000000000000000000000000000001010110101011101000001111)bin
= (8070450532270651919)dec
What am I missing? It seems like there is some operation still performed using 52bit double arithmetic, but I don't know which one.
I checked if the operations are available for uint64 and it seems that the ones I use (power, times, sum) are there:
>> methods uint64
Methods for class uint64:
abs bitxor diff isinf mod plus sum
accumarray bsxfun display isnan mpower power times
all ceil eq issorted mrdivide prod transpose
and colon find ldivide mtimes rdivide tril
any conj fix le ne real triu
bitand ctranspose floor linsolve nnz rem uminus
bitcmp cummax full lt nonzeros reshape uplus
bitget cummin ge max not round xor
bitor cumprod gt min nzmax sign
bitset cumsum imag minus or sort
bitshift diag isfinite mldivide permute sortrowsc
You were right in saying that
It seems like there is some operation still performed using 52bit double arithmetic.
The problem is in line
dec = uint64(sum(uint64(base.*(uint64(bin-'0')))));
The operation sum(uint64(base.*(uint64(bin-'0')))) gives a double result, which only has about 15 significant digits. That's why your lowest digits are wrong. Subsequent conversion into uint64 doesn't help, because precision has already been lost.
The solution is to sum natively in uint64. This gives a uint64 result with its full precision:
dec = sum(uint64(base.*(uint64(bin-'0'))), 'native');
Had the same thought as #beaker, break it into chunks:
%% dec2bin
x=intmax('uint64')
MSBs = dec2bin( bitshift(x,-32) ,32)
LSBs = dec2bin( bitand(x, hex2dec('FFFFFFFF')) ,32)
y = [MSBs LSBs]
%% bin2dec
MSBs = y(1:32)
LSBs = y(33:64)
z = bitor( bitshift( uint64(bin2dec(MSBs)) , 32 ) , uint64(bin2dec(LSBs)) )
% (now x = z)
Oddly enough, it seems that dec2bin doesn't give an error, but does give incorrect answers for 64 bit numbers:
dec2bin( intmax('uint64') )
ans =
10000000000000000000000000000000000000000000000000000000000000000

Is there any way to increase 'realmax' in MATLAB?

realmax on my machine is:
1.7977e+308
I know I have to write my code in a way to avoid long integer calculations, but is there any way to increase the limit?
I mean something like gmp library in C
You may find vpa (variable- precision arithmetic) helpful:
R = vpa(A) uses variable-precision arithmetic (VPA) to compute each element of A to at least d decimal digits of accuracy, where d is the current setting of digits.
R = vpa(A,d) uses at least d significant (nonzero) digits, instead of the current setting of digits.
Here's an example how to use it:
>> x = vpa('10^500/20')
ans =
5.0e498
Note that:
The output x is of symbolic (sym) type. Of course, you shouldn't convert it to double, because it would exceed realmax:
>> double(x)
ans =
Inf
Use string input in order to avoid evaluating large input values as double. For example, this doesn't work
>> vpa(10^500/20)
ans =
Inf
because 10^500 is evaluated as double, giving inf, and then is used as an input to vpa.

MATLAB - integers vs decimals assignment strange bug

newT = [b(i) d(i) a(i) z(i)];
newT, b(i), a(i)
Prints
newT =
123 364 123 902
ans =
1.234e+02
ans =
1.234e+02
What is the problem here? Why are the first and third entry in newT rounded to integer values? Why aren't they correctly assigned?
Unlike most other programming languages, integer types in Matlab take precedence over floating point types. When you combine them, either through concatenation or arithmetic, the floating point values are implicitly narrowed to integers, instead of the integers being widened to floating point.
>> int32(3) + 0.4
ans =
3
>> [int32(3) 0.4]
ans =
3 0
This is for historical reasons, because (IIRC) Matlab originally didn't have support for integers at all, so all numeric constants in Matlab produce double values, and the promotion rules were created to make it possible to mix integer types with floating-point constants.
To fix this, explicitly convert those int types to doubles before concatenating.
newT = [b(i) double(d(i)) a(i) double(z(i))];