Precise division of doubles representing integers exactly (when they are divisible)

Precise division of doubles representing integers exactly (when they are divisible) - matlab

Given that 8-byte doubles can represent all 4-byte ints precisely, I'm wondering whether dividing a double A storing an int, by a double B storing an int (such that the integer B divides A) will always give the exact double corresponding to the integer that is their quotient? So, if B and C are integers, and B*C fits within a 32-bit int, then is it guaranteed that
int B,C = whatever s.t. B*C does not overflow 32-bit int
double(B*C)/double(C) == double((B*C)/C) ?
Does the IEEE754 standard guarantee this?
In my testing, it seems to work for all examples I've tried. In Python:
>>> (321312321.0*3434343.0)/321312321.0 == 3434343.0
True
The reason for asking is that Matlab makes it hard to work with ints, so I often just use the default doubles for integer calculations. And when I know that the integers are exactly divisible, and if I know that the answer to the present question is yes, then I could avoid doing casts to ints, idivide(..) etc., which is less readable.

Luis Mendo's comment does answer this question, but to specifically address the use in Matlab there are some handy utilities described here. You can use eps(numberOfInterest) to find the distance to the next largest double-precision floating point number. For example:
eps(1) = 2^(-52)
eps(2^52) = 1
This practically guarantees that mathematical operations with integers held in a double will be precise provided they don't overflow 2^52, which is quite a bit larger than what is held in a 32-bit int type.

Related

Scala Has Infinity but no Infinitesimal. Why?

Open a Scala interpreter.
scala> 1E-200 * 1E-200
res1: Double = 0.0
scala> 1E200 * 1E200
res2: Double = Infinity
A very large product value evaluates to Infinity.
A very small value evaluates to zero.
Why not be symmetrical and create something called Infinitesimal?

Basically this has to do with the way floating point numbers work, which has more to do with your processor than scala. The small number is going to be so small that the closest representation corresponds to +0 (positive zero), and so it underflows to 0.0. The large number is going to overflow past any valid representation and be replaced with +inf (positive infinity). Remember that floating point numbers are a fixed precision estimation. If you want a system that is more exact, you can use http://www.scala-lang.org/api/2.11.8/#scala.math.BigDecimal

Scala, just like Java, follows the IEEE specification for floating point numbers, which does not have "infinitesimals". I'm not quite sure infinitesimals would make much sense either way, as they have no mathematical interpretation as numbers.

how to get reverse(not complement or inverse) of a binary number

I am implementing cooley-tuckey fft(raddix - 2 DIF / DIT) algorithm in matlab.In that for the bit reversing i want to have reverse of an binary number. so can anyone suggest how can I get the reverse of a binary number(like 100111 -> 111001). One who have worked on fft implementation can help me with the algorithm also.

Topic: How to do bit reversal in Matlab? .
If you're using double precision floating point ('double') numbers
which are integers, you can do this:
dr = bin2dec(fliplr(dec2bin(d,n))); % Bits in dr are in reverse order
where n is the number of bits to be reversed and where 0 <= d < 2^n.
You will experience no precision problems at all as long as the
integers are no more than 52 bits long.
And
Re: How to do bit reversal in Matlab?
How large will the numbers be that you need to reverse? May I ask what
is the purpose of it? Maybe there is a more efficient way to solve the
whole problem. If the numbers are large you can just store the bits as
a string. To reverse it just read the string backwards! Or use
fliplr().
(There may be better places to ask).
If it were VHDL I'd suggest an alias with 'REVERSE'RANGE.

Taken from the help section;
Y = swapbytes(X) reverses the byte ordering of each element in array X, converting little-endian values to big-endian (and vice versa). The input array must contain all full, noncomplex, numeric elements.

How to stop matlab truncating long numbers

These two long numbers are the same except for the last digit.
test = [];
test(1) = 33777100285870080;
test(2) = 33777100285870082;
but the last digit is lost when the numbers are put in the array:
unique(test)
ans = 3.3777e+16
How can I prevent this? The numbers are ID codes and losing the last digit is screwing everything up.

Matlab uses 64-bit floating point representation by default for numbers. Those have a base-10 16-digit precision (more or less) and your numbers seem to exceed that.
Use something like uint64 to store your numbers:
> test = [uint64(33777100285870080); uint64(33777100285870082)];
> disp(test(1));
33777100285870080
> disp(test(2));
33777100285870082
This is really a rounding error, not a display error. To get the correct strings for output purposes, use int2str, because, again, num2str uses a 64-bit floating point representation, and that has rounding errors in this case.

To add more explanation to #rubenvb's solution, your values are greater than flintmax for IEEE 754 double precision floating-point, i.e, greater than 2^53. After this point not all integers can be exactly represented as doubles. See also this related question.

How to use Bitxor for Double Numbers?

I want to use xor for my double numbers in matlab,but bitxor is only working for int numbers. Is there a function that could convert double to int in Matlab?

The functions You are looking for might be: int8(number), int16(number), uint32(number) Any of them will convert Double to an Integer, but You must pick the best one for the result You want to achieve. Remember that You cannot cast from Double to Integer without rounding the number.
If I understood You correcly, You could create a function that would simply remove the "comma" from the Double number by multiplying your starting value by 2^n and then casting it to Integer using any of the functions mentioned earlier, performing whatever you want and then returning comma to its original position by dividing the number by 2^n
Multiplying the starting value by 2^n is a hack that will decrease the rounding error.
The perfect value for n would be the number of digits after the comma if this number is relatively small.
Please also specify, why are You trying to do this? This doesn't seem to be the optimal solution.

You can just cast to an integer:
a = 1.003
int8(a)
ans =
1
That gives you an 8 bit signed integer, you can also get other size i.e. int16 or else unsigned i.e. uint8 depending on what you want to do

fortran90 reading array with real numbers

I have a list of real data in a file. The real data looks like this..
25.935
25.550
24.274
29.936
23.122
27.360
28.154
24.320
28.613
27.601
29.948
29.367
I write fortran90 code to read this data into an array as below:
PROGRAM autocorr
implicit none
INTEGER, PARAMETER :: TRUN=4000,TCOR=1800
real,dimension(TRUN) :: angle
real :: temp, temp2, average1, average2
integer :: i, j, p, q, k, count1, t, count2
REAL, DIMENSION(0:TCOR) :: ACF
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
open(100, file="fort.64",status="old")
do k = 1,TRUN
read(100,*) angle(k)
end do
Then, when I print again to see the values, I get
25.934999
25.549999
24.274000
29.936001
23.122000
27.360001
28.153999
24.320000
28.613001
27.601000
29.948000
29.367001
32.122002
33.818001
21.837000
29.283001
26.489000
24.010000
27.698000
30.799999
36.157001
29.034000
34.700001
26.058001
29.114000
24.177000
25.209000
25.820999
26.620001
29.761000
May I know why the values are now 6 decimal points?
How to avoid this effect so that it doesn't affect the calculation results?
Appreciate any help.
Thanks

You don't show the statement you use to write the values out again. I suspect, therefore, that you've used Fortran's list-directed output, something like this
write(output_unit,*) angle(k)
If you have done this you have surrendered the control of how many digits the program displays to the compiler. That's what the use of * in place of an explicit format means, the standard says that the compiler can use any reasonable representation of the number.
What you are seeing, therefore, is your numbers displayed with 8 sf which is about what single-precision floating-point numbers provide. If you wanted to display the numbers with only 3 digits after the decimal point you could write
write(output_unit,'(f8.3)') angle(k)
or some variation thereof.
You've declared angle to be of type real; unless you've overwritten the default with a compiler flag, this means that you are using single-precision IEEE754 floating-point numbers (on anything other than an exotic computer). Bear in mind too that most real (in the mathematical sense) numbers do not have an exact representation in floating-point and that the single-precision decimal approximation to the exact number 25.935 is likely to be 25.934999; the other numbers you print seem to be the floating-point approximations to the numbers your program reads.
If you really want to compute your results with a lower precision, then you are going to have to employ some clever programming techniques.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Precise division of doubles representing integers exactly (when they are divisible) - matlab

Related

Scala Has Infinity but no Infinitesimal. Why?

how to get reverse(not complement or inverse) of a binary number

How to stop matlab truncating long numbers

How to use Bitxor for Double Numbers?

fortran90 reading array with real numbers

Categories

Resources