I am casting a 64bit fixed point number to floating point. How should that be done in Matlab? The following code gives different results. What is the difference between typecast and the double(x)
temp = 2^32*uint64(MSB) + uint64(LSB);
out_0(1, 1) = typecast(temp, 'double');
out_1(1, 1) = double(temp);
An Example:
temp = 4618350711997530112
data = typecast(temp, 'double')
data =
5.9194
>> double(temp)
ans =
4.6184e+18
If you want to maintain the same number, you should definitely use double to convert it:
double Convert to double precision.
double(X) returns the double precision value for X.
Whereas typecast maintains the internal representation, i.e. the bytes are maintained the same but or differently interpreted:
typecast Convert datatypes without changing underlying data.
Y = typecast(X, DATATYPE) convert X to DATATYPE. If DATATYPE has
fewer bits than the class of X, Y will have more elements than X. If
DATATYPE has more bits than the class of X, Y will have fewer
elements than X.
Note that it is only possible to use typecast when the number of bytes are the same, which is not true for double as it tries to represent the same number as close as possible in double precision. For example, you cannot typecast uint32 to double, but you can typecast two uint32 to one double number. If you use double to convert it, you will obtain respectively one and two doubles.
C++ equivalent
X = double(uint64(123));
=> int64_t x = 123; double X = x;
X = typecast(uint64(123), 'double')
=> int64_t x = 123; double X = reinterpret_cast<double>(x);
In addition, because it seems you have two 32-bit uint values MSB and LSB; To convert them to uint 64 you can use typecast.
U = typecast([MSB,LSB],'uint64')
Then conversion to double as suggested by m7913d
D = double(U)
So you see typecast has a very different function compared to double.
Related
X= 1:63;
n = 6;
% Y = int2bit(X,n)
y=dec2bin(X, n)
with this example I tried str2double(y) and got NaN
What is a problem?
str2double will only convert text that represents real or complex scalar values. Where y is a char array of binary values. It is basically interpreting y as one large integer. Hence, it will return NaN or Inf depending on the version of MATLAB you are using.
You can use convertCharsToStrings and then use str2double
e.g
for i = 1:length(y)
tempvar = convertCharsToStrings(y(i,:));
x1(i) = str2double(tempvar);
end
OR if you just want to convert all string into double then use
arrayfun(#(x)str2double(convertCharsToStrings(x)),y,'Uniformoutput',false)
I am trying to calculate a limit operation of a function inside. Here is what I did:
x = 0;
f = (cos(x)*cos(h/2)*sin(h/2))/(h/2) - (sin(x)*sin(h/2)*sin(h/2))/(h/2);
limit(f,h,0)
ans =
1
limit(f,h,1)
ans =
2*cos(1/2)*sin(1/2)
I want to see what the numeric value of 2*cos(1/2)*sin(1/2) is. How do I obtain this value?
You can use double to evaluate the final expression:
double(limit(f,h,1))
ans =
0.8415
limit is a symbolic function, so it outputs symbolic functions. You can use double (or single or whatever numeric type you want) to convert to a number.
I understand that the int value is in relation with the storage capacity.But, if I change int 8 to int 16 only the capacity will be altered?
Since the other answers have not yet formalized this I will give you and explanation including some keywords to look up and some more elaborate explanation.
The size of a data type is actually based on the capacity of the storage.
int8 - 8 bits signed integer, MSB (most significant bit) representing the sign. Range [-2^7,2^7-1] = [-128,127]
uint8 - 8 bits unsigned integer, MSB denotes the highest power of 2. Range [0,2^8-1] = [0,255]
int16 - 16 bits signed integer
uint16 - 16 bits unsigned integer
I could keep going but you probably get the picture. There is also int32, uint32, int64, uint64.
There is also char which can be used for text, but also instead of uint8 (in MATLAB a char is 16-bits though) (with the difference that char is printed as a char and not a number). This is normal to do for many c-like languages.
The types float and double is different since they use a floating point precision, standardized by the IEEE. The format is used to represent large numbers in a good way.
This data type uses an exponential representation of the numbers. The data type allocates a fixed set of bits for exponential and precision digits and one bit for sign. A floating point number can be divided like this,
(sign),Mantissa,exponent
For double the bit allocation is 1-bit for sign, 11-bits for exponent, 52-bits for Mantissa. For single it is 8-bits for exponent, 23-bits for Mantissa.
This is some necessary background for discussing type conversion. When discussing type conversion you normally speak about implicit conversion and explicit conversion. The terms is not really relevant for Matlab since Matlab can identify type automatically.
a = 2; % a is a double
b=a; % b is a double
c = int8(57); % c is an int8
d = c; % d is an int 8
However explicit conversion is done with the built in conversion functions.
c = int8(57);
d = char(c);
When discussing different kinds of conversion we often talk about type promotion and type demotion. Type promotion is when a data type of lower precision is promoted to a type of higher precision.
a = int8(57);
b = int16(a);
This is lossless and considered safe. Type demotion is on the other hand when a type of higher precsion is converted to a type of lower precision.
c = int16(1234);
d = int8(c); % Unsafe! Data loss
This is generally considered risky and shuold be avoided. Normally the word type demotion is not used so often since this conversion is uncommon. A conversion from higher to lower precision needs to be checked.
function b = int16ToInt8(a)
if (any(a < -128 | a > 127))
error('Variable is out of range, conversion cannot be done.');
end
b=int8(a);
In most languages type demotion cannot be done implicitly. Also conversion between floating point types and integer types should be avoided.
An important note here is how Matlab initiates variables. If a "constructor" is not used (like a=int8(57)), then Matlab will automatically set the variable to double precision. Also, when you initiate a vector/matrix of integers like int64([257,3745,67]), then the "constructor" for the matrix is called first. So int64() will actually be called on a matrix of doubles. This is important because if the integer needs more precsion than 52-bits, the precision is too low for a double. So
int64([2^53+2^0,2^54+2^1, 2^59+2^2]) ~=
[int64(2^53) + int64(2^0), int64(2^54)+ int64(2^1), int64(2^59)+ int64(2^2)]
Further, in case the memory on the device allows it is commonly recommended to use int32 or int64 and double
Each type of integer has a different range of storage capacity:
int 8: Values range from -128 to 127
int 16: Values range from -32,768 to 32,767
You only need to worry about data loss when converting to a lower precision datatype. Because int16 is higher precision than int8, your existing data will remain intact but your data can span twice the range of values at the cost of taking up twice as much space (2 bytes vs. 1 byte)
a = int8(127);
b = int16(a);
a == b
% 1
whos('a', 'b')
% Name Size Bytes Class Attributes
%
% a 1x1 1 int8
% b 1x1 2 int16
int8 variables can range from -128 to 127, while this range for int16 class is from -32,768 to 32,767. Obviously, memory is the price to pay for the wider range ;)
Note 1: These limits do not apply only on the variables when defining them, but also usually on the outputs of calculations!
Example:
>> A = int8([0, 10, 20, 30]);
>> A .^ 2
ans =
0 100 127 127
>> int16(A) .^ 2
ans =
0 100 400 900
Note 2: Once you switch to int16, usually you should do it for all variables that participate in calculations together.
Example:
>> A + int16(A)
Error using +
Integers can only be combined with integers of the same class, or scalar doubles.
In this answer gire mentioned to better not use == when comparing doubles.
When creating a increment variable in a for loop using start:step:stop notation, it's type will be of double. If one wants to use this loop variable for indexing and == comparisons, might that cause problems due to floating point precision?!
Should one use integers? If so, is there a way to do so with the s:s:s notation?
Here's an example
a = rand(1, 5);
for ii = length(a):-1:1
if (ii == 1) % Comparing var of type double with ==
b = 0;
else
b = a(ii); % Using double for indexing
end
... % Code
end
Note that the floating point double specification uses 52 bits to store the mantissa (the part after the decimal point) so you can exactly represent any integer in the range
-4503599627370496 <= x <= 4503599627370496
Note that this is larger than the range of an int32, which can only represent
-2147483648 <= x <= 2147483647
If you are just using the double as a loop variable, and only incrementing it in integer steps, and you are not counting above 4,503,599,627,370,496 then you are fine to use a double, and to use == to compare doubles.
One reason people suggest for not using doubles is that you can't represent some common decimals exactly, e.g. 0.1 has no exact representation as a double. Therefore if you are working with monetary values, it may be better to separately store the data as an int and remember a scale factor of 10x or 100x or whatever.
It's sometimes bad to directly compare floating point numbers for equality because rounding issues can cause two floats to be not equal, even though the numbers are mathematically equal. This generally happens when the numbers are not exactly representable as floats, or when there is a significant size difference between the numbers, e.g.
>> 0.3 - 0.2 == 0.1
ans =
0
If you're indexing between integer bounds with integer steps (even though the variable class is actually double), it is ok to use == for comparisons with other integers.
You can cast the indices, if you really want to be safe.
For example:
for ii = int16(length(a):-1:1)
if (ii == 1)
b = 0;
end
end
I would let the output speak for itself:
>> numFiles, meanTangle, sdTangle
numFiles =
526
meanTangle =
0.4405
sdTangle =
0.1285
Now, when I create a vector out of these variables:
>> [numFiles meanTangle sdTangle]
ans =
526 0 0
Also, just for clarification:
>> class(numFiles)
ans =
int32
>> class(meanTangle)
ans =
double
>> class(sdTangle)
ans =
double
Why does MATLAB convert floats (meanTangle and sdTangle) to int without cast?
It converts all of your doubles to ints because your array contains a single int. This has to do with a precision issue.
It converts the entire array into type int32:
>> class(ans)
ans =
int32
For reasons not explained, combining an integer data type in an array with floating point data is defined by MATLAB to return an integer data type.
Check this for more info Float becomes integer
.Your numFiles is an integer here so It converts all other variables also as integer.