I understand that the int value is in relation with the storage capacity.But, if I change int 8 to int 16 only the capacity will be altered?
Since the other answers have not yet formalized this I will give you and explanation including some keywords to look up and some more elaborate explanation.
The size of a data type is actually based on the capacity of the storage.
int8 - 8 bits signed integer, MSB (most significant bit) representing the sign. Range [-2^7,2^7-1] = [-128,127]
uint8 - 8 bits unsigned integer, MSB denotes the highest power of 2. Range [0,2^8-1] = [0,255]
int16 - 16 bits signed integer
uint16 - 16 bits unsigned integer
I could keep going but you probably get the picture. There is also int32, uint32, int64, uint64.
There is also char which can be used for text, but also instead of uint8 (in MATLAB a char is 16-bits though) (with the difference that char is printed as a char and not a number). This is normal to do for many c-like languages.
The types float and double is different since they use a floating point precision, standardized by the IEEE. The format is used to represent large numbers in a good way.
This data type uses an exponential representation of the numbers. The data type allocates a fixed set of bits for exponential and precision digits and one bit for sign. A floating point number can be divided like this,
(sign),Mantissa,exponent
For double the bit allocation is 1-bit for sign, 11-bits for exponent, 52-bits for Mantissa. For single it is 8-bits for exponent, 23-bits for Mantissa.
This is some necessary background for discussing type conversion. When discussing type conversion you normally speak about implicit conversion and explicit conversion. The terms is not really relevant for Matlab since Matlab can identify type automatically.
a = 2; % a is a double
b=a; % b is a double
c = int8(57); % c is an int8
d = c; % d is an int 8
However explicit conversion is done with the built in conversion functions.
c = int8(57);
d = char(c);
When discussing different kinds of conversion we often talk about type promotion and type demotion. Type promotion is when a data type of lower precision is promoted to a type of higher precision.
a = int8(57);
b = int16(a);
This is lossless and considered safe. Type demotion is on the other hand when a type of higher precsion is converted to a type of lower precision.
c = int16(1234);
d = int8(c); % Unsafe! Data loss
This is generally considered risky and shuold be avoided. Normally the word type demotion is not used so often since this conversion is uncommon. A conversion from higher to lower precision needs to be checked.
function b = int16ToInt8(a)
if (any(a < -128 | a > 127))
error('Variable is out of range, conversion cannot be done.');
end
b=int8(a);
In most languages type demotion cannot be done implicitly. Also conversion between floating point types and integer types should be avoided.
An important note here is how Matlab initiates variables. If a "constructor" is not used (like a=int8(57)), then Matlab will automatically set the variable to double precision. Also, when you initiate a vector/matrix of integers like int64([257,3745,67]), then the "constructor" for the matrix is called first. So int64() will actually be called on a matrix of doubles. This is important because if the integer needs more precsion than 52-bits, the precision is too low for a double. So
int64([2^53+2^0,2^54+2^1, 2^59+2^2]) ~=
[int64(2^53) + int64(2^0), int64(2^54)+ int64(2^1), int64(2^59)+ int64(2^2)]
Further, in case the memory on the device allows it is commonly recommended to use int32 or int64 and double
Each type of integer has a different range of storage capacity:
int 8: Values range from -128 to 127
int 16: Values range from -32,768 to 32,767
You only need to worry about data loss when converting to a lower precision datatype. Because int16 is higher precision than int8, your existing data will remain intact but your data can span twice the range of values at the cost of taking up twice as much space (2 bytes vs. 1 byte)
a = int8(127);
b = int16(a);
a == b
% 1
whos('a', 'b')
% Name Size Bytes Class Attributes
%
% a 1x1 1 int8
% b 1x1 2 int16
int8 variables can range from -128 to 127, while this range for int16 class is from -32,768 to 32,767. Obviously, memory is the price to pay for the wider range ;)
Note 1: These limits do not apply only on the variables when defining them, but also usually on the outputs of calculations!
Example:
>> A = int8([0, 10, 20, 30]);
>> A .^ 2
ans =
0 100 127 127
>> int16(A) .^ 2
ans =
0 100 400 900
Note 2: Once you switch to int16, usually you should do it for all variables that participate in calculations together.
Example:
>> A + int16(A)
Error using +
Integers can only be combined with integers of the same class, or scalar doubles.
Related
I want to test a piece of function and from 127, it is normal for me that 127+1 = -128. But for Matlab, it saturates my value even though it is a desired behavior on my code.
There are explanations to disable this option on Simulink but what about for a script? I don't know how to disable this feature.
Overflow is not part of Matlab hypotheses.
You need to implement this behaviour in your script using the modulo function (mod).
For example:
>> a=127; mod(a+128,256)-128
ans =
127
>> a=128; mod(a+128,256)-128
ans =
-128
Since you use 127 and -128 as the examples, I assume you are working with int8 variable types. To get the modulo behavior you want, you could use a simple C mex routine to do the arithmetic (since your C compiler will in all likelihood optimize this overflow condition away as simple modulo behavior), or in m-code you can convert to a larger type and do the arithmetic yourself (assumes your machine uses 2's complement storage for integer types). E.g.,
a8 = int8(127); % sample data
b8 = int8(1); % sample data
a16 = int16(a8); % convert to larger type
b16 = int16(b8); % convert to larger type
c16 = a16 + b16 % do the addition in larger type
c16 = int16
128
c8s = typecast(c16,'int8') % typecast back to int8 (assume 2's complement storage)
c8s = 1x2
-128 0
c8 = c8s(1) % pick either c8s(1) or c8s(2) depending on endian of your machine
c8 = int8
-128
If you are working with arrays of numbers instead of scalars, then you could put this in a loop or vectorize the last line as either c8s(1:2:end) or c8s(2:2:end)
You may use fi object from Fixed-Point toolbox and set OverflowAction to Wrap.
Using fi for applying int8 type that overflows, is a bit of an overkill, but possible.
Example:
x = fi(127, true, 8, 0, 'OverflowAction', 'Wrap', 'SumMode', 'SpecifyPrecision', 'SumWordLength', 8, 'SumFractionLength', 0);
x + 1
Output:
ans =
-128
DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 8
FractionLength: 0
RoundingMethod: Nearest
OverflowAction: Wrap
ProductMode: FullPrecision
SumMode: SpecifyPrecision
SumWordLength: 8
SumFractionLength: 0
CastBeforeSum: true
If you really want to use the int8 overflow and not simulate it with the mod function, you can use the typecast function.
First, you need to convert your variable into an int (otherwise it is by default a double in Matlab). Then you cast it to an int8 and you keep only the first byte:
>> a=127; getfield(typecast(int64(a),'int8'),{1})
ans =
int8
127
>> a=128; getfield(typecast(int64(a),'int8'),{1})
ans =
int8
-128
Say I have the following single-precision floating point number in Matlab
a = single(-2.345)
I would like to represent the number as an array of 4 bytes, following IEEE 754. The correct representation should be
b = [123, 20, 22, 192]
Currently, I am using fread and fwrite to do the conversion, as in
fid = fopen('test.dat','wb')
fwrite(fid,a,'float')
fclose(fid)
fid = fopen('test.dat','rb');
b = fread(fid)'
which well enough, but I suspect there is a much easier and faster way to do the conversion without reading/writing from a file.
There have been a few posts about converting a byte array to a float (such as here), but I'm unsure how to proceed to go in the opposite direction. Any suggestions?
You can use the typecast function to cast between datatypes without changing the underlying data, i.e. reinterpret data using another type. In your case, you will want to cast from single to uint8 (byte) datatype. This is done by
a = single(-2.345);
typecast(a,'uint8')
ans =
123 20 22 192
as required.
In this answer gire mentioned to better not use == when comparing doubles.
When creating a increment variable in a for loop using start:step:stop notation, it's type will be of double. If one wants to use this loop variable for indexing and == comparisons, might that cause problems due to floating point precision?!
Should one use integers? If so, is there a way to do so with the s:s:s notation?
Here's an example
a = rand(1, 5);
for ii = length(a):-1:1
if (ii == 1) % Comparing var of type double with ==
b = 0;
else
b = a(ii); % Using double for indexing
end
... % Code
end
Note that the floating point double specification uses 52 bits to store the mantissa (the part after the decimal point) so you can exactly represent any integer in the range
-4503599627370496 <= x <= 4503599627370496
Note that this is larger than the range of an int32, which can only represent
-2147483648 <= x <= 2147483647
If you are just using the double as a loop variable, and only incrementing it in integer steps, and you are not counting above 4,503,599,627,370,496 then you are fine to use a double, and to use == to compare doubles.
One reason people suggest for not using doubles is that you can't represent some common decimals exactly, e.g. 0.1 has no exact representation as a double. Therefore if you are working with monetary values, it may be better to separately store the data as an int and remember a scale factor of 10x or 100x or whatever.
It's sometimes bad to directly compare floating point numbers for equality because rounding issues can cause two floats to be not equal, even though the numbers are mathematically equal. This generally happens when the numbers are not exactly representable as floats, or when there is a significant size difference between the numbers, e.g.
>> 0.3 - 0.2 == 0.1
ans =
0
If you're indexing between integer bounds with integer steps (even though the variable class is actually double), it is ok to use == for comparisons with other integers.
You can cast the indices, if you really want to be safe.
For example:
for ii = int16(length(a):-1:1)
if (ii == 1)
b = 0;
end
end
newT = [b(i) d(i) a(i) z(i)];
newT, b(i), a(i)
Prints
newT =
123 364 123 902
ans =
1.234e+02
ans =
1.234e+02
What is the problem here? Why are the first and third entry in newT rounded to integer values? Why aren't they correctly assigned?
Unlike most other programming languages, integer types in Matlab take precedence over floating point types. When you combine them, either through concatenation or arithmetic, the floating point values are implicitly narrowed to integers, instead of the integers being widened to floating point.
>> int32(3) + 0.4
ans =
3
>> [int32(3) 0.4]
ans =
3 0
This is for historical reasons, because (IIRC) Matlab originally didn't have support for integers at all, so all numeric constants in Matlab produce double values, and the promotion rules were created to make it possible to mix integer types with floating-point constants.
To fix this, explicitly convert those int types to doubles before concatenating.
newT = [b(i) double(d(i)) a(i) double(z(i))];
how to filter out float values with only zeroes beyond decimal & others having some non-zero values beyond decimal too.
for example.
13.000000
13.120001
i want it like this:
13.0
13.120001
If your application stores the floating point values in something like C/C++'s float's or double's, you may have a problem here.
First of all, float/double in the majority of cases represents values using base-2 and when you do something like double x = 0.1;, x in fact will never be equal to 0.1, but it will be very close to 0.1. That is because not every decimal (base-10) fraction can be represented exactly in base-2.
When you print or convert that x to a string using one of the printf-like functions, the resultant string can vary from something like 0.099999999 to 0.1 to 0.10000000000000000555. The result will depend on what's in x and on how you convert it to a string (how many digits you allow for the integer part, fractional part or all).
Generally there's no one-size-fits-all solution here. In one case the exactness and rounding issues can be unimportant, while in another they can be the most important. When performance is more important than the exact representation of decimal fractions, you use base-2 types. Otherwise, for example, when you're counting money, you use special types (and you may first need to construct them) that can represent decimal fractions exactly and then you avoid some of the described issues at a cost of extra computation time and maybe storage.
You may sometimes use fixed-point types and arithmetic. For example, if your numbers should not have more than 3 fractional digits and aren't too big, you can use scaled integers:
long x = 123456; // x represents 123.456
You add, subtract and compare them just as regular integers, but you multiply and divide them differently, for example like this:
long x = 123456; // x represents 123.456
long y = 12345; // y represents 12.345
long p = (long)(((long long)x * y) / 1000); // p=1524064, representing x*y=1524.064
long q = (long)((long long)x * 1000) / y); // q=10000, representing x/y=10.000
And in this case it is relatively easy to figure out the digits of the fractional part, just look at (x%1000)/100, (x%100)/10 and x%10.
split 13.00000 or 13.20001 into 2 values like in 13.0000 as 13 and 0000 ...so compare 2nd value whether it is greater than 0 (ex:13.20001 where 2001>0) if its yes then set same value . else (ex 13.0000 where 0000=0) then set %.2f=> 13.00