Getting double precision in fortran 90 using intel 11.1 compiler - double

I have a very large code that sets up and iteratively solves a system of non-linear partial differential equation, written in fortran. I need all variables to be double precision. In the additional module that I have written for the code, I declare all variables as the double precision type, but my module still uses variables from the old source code that are declared as type real. So my question is, what happens when a single-precision variable is multiplied by a double precision variable in fortran? Is the result double precision if the variable used to store the value is declared as double precision? And what if a double precision value is multiplied by a constant without the "D0" at the end? Can I just set a compiler option in Intel 11.1 to make all real/double precision/constants of double precision?

So my question is, what happens when a single-precision variable is multiplied by a double precision variable in fortran? The single precision is promote to double precision and the operation is done in double precision.
Is the result double precision if the variable used to store the value is declared as double precision? Not necessarily. The right-hand side is an expression that doesn't "know" about the precision of the variable on the left hand side, in to which it will be stored. If you have Double = SingleA * SingleB (using names to indicate the types), the calculation will be performed in single precision, then converted to double for storage. This will NOT gain extra precision for the calculation!
And what if a double precision value is multiplied by a constant without the "D0" at the end? This is just like the first question, the constant will be promoted to double precision and the calculation done in double precision. However, the constant is still single precision and even if you wrote down many digits as for a double-precision constant, the internal storage is single precision and cannot represent that accuracy. For example, DoubleVar * 3.14159265359 will be calculated in double precision, but will be something approximating DoubleVar * 3.14159 done in double precision.
If you want to have the compiler retain many digits in a constant, you must specific the precision of a constant. The Fortran 90 way to do this is to define your own real type with whatever precision that you need, e.g., to require at least 14 decimal digits:
integer, parameter :: DoubleReal_K = selected_real_kind (14)
real (DoubleReal_K) :: A
A = 5.0_DoubleReal_K
A = A * 3.14159265359_DoubleReal_K

The Fortran standard is very specific about this; other languages are like this, too, and it's really what you'd expect. If an expression contains an operation on two floating-point variables of different precisions, then the expression is of the type of the higher-precision operand. eg,
(real variable) + (double variable) -> (double)
(double variable)*(real variable) -> (double)
(double variable)*(real constant) -> (double)
etc.
Now, if you are storing the result in a lower-precision floating point variable, it'll get down-converted again. But if you are storing it in a variable of the higher precision, it'll maintain it's precision.
If there's any cases where you're concerned that a single-precision floating point variable is causing a problem, you can force it to be converted to double precision
using the DBLE() intrinsic:
DBLE(real variable) -> double

If you write numbers in the form 0.1D0 it will treat it as double precision number, otherwise if you write 0.1, the precision will be lost in the conversion.
Here is an example:
program main
implicit none
real(8) a,b,c
a=0.2D0
b=0.2
c=0.1*a
print *,a,b,c
end program
When compiled with
ifort main.f90
I get results:
0.200000000000000 0.200000002980232 2.000000029802322E-002
When compiled with
ifort -r8 main.f90
I get results:
0.200000000000000 0.200000000000000 2.000000000000000E-002
If you use the IBM XLF compiler, the equivalence is
xlf -qautodbl=dbl4 main.f90

Jonathan Dursi's answer is correct - the other part of your question was if there was a way to make all real variables double precision.
You can accomplish this with the ifort compiler by using the -i8 (for integers) and -r8 (for reals) options. I'm not sure if there is a way to force the compiler to interpret literals as double-precision without specifying them as such (e.g. by changing 3.14159265359 to 3.14159265359D0) - we ran into this issue a while back.

Related

Division not working properly in Swift

Here is my code:
println(Double(2/5))
When I run this, it prints out
0.0
How can I fix this? I want it to come out to 0.4. It there some issue with the rounding?
The problem is that you're not converting to a Double until after you've done integer division between two integers. Let's take a look at order of operations. We start at the inside and move outward.
Perform integer division between the integer 2 and the integer 5, which results in the integer 0.
Create a double from the integer 0, which creates the double 0.0.
Call description on the double 0.0, which returns the string "0.0"
Call println on the string "0.0"
We can fix this by calling the Double constructor on each side of the division before we divide them.
println((Double(2)/Double(5)))
Now the order of operations is:
Convert the integer 2 to the floating point 2.0
Convert the integer 5 to the floating point 5.0
Perform floating point division between these floating point numbers, resulting in 0.4
Call description on the floating point number 0.4, which returns the string "0.4".
Call println on the string "0.4".
Note that it's not strictly necessary to convert both sides of the division to Double.
And as long as we're dealing with literals, we can just write println(2.0/5.0).
We could also get away with writing println((2 * 1.0)/5) which should now interpret all of our literals as floating point (as we've multiplied it by a floating point).
As long as either side of a math operating is a floating point type, the integer literal will be interpreted as a floating point type by Swift, but in my opinion, it's far better to explicitly convert our types so that we're excruciatingly clear on exactly what we want to happen. So let's get all of our numbers into the same type and be explicitly clear what we actually want.
If we're dealing with literals, we can add .0 to them to force them as floating point numbers:
println(2.0/5.0)
If we're doing with variables, we can use a constructor:
let myTwoInt: Int = 2
let myFiveInt: Int = 5
println((Double(myTwoInt)/Double(myFiveInt))
I think your issue is that you are dividing two integers which normally will return an integer.
I had a similar issue in java, adding a .0 to one or the other integers or converting either to a double by using the double function should fix it.
It's a feature of typed languages that creates a result of the same type as the values being divided.
Digits is correct about the cause; instead of the approach you're taking, try this:
print(2.0 / 5.0)

how I must use digits function in matlab

i have code and use double function several time to convert sym to double.to increase precision , I want to use digits function.
I want to know it is enough that I write digits in the top of code or I must write digits in above of every double function.
digits set's the precision until it is changed again. Calling digits() without any input you get the precision to verify it's set correct.
In many cases digis has absoluetly no influence on symbolic variables because an analytical solution is found. This means there are no precision errors unless you convert to double. When convertig, digits should be set to at least 16 because this matches double precision.

How to use Bitxor for Double Numbers?

I want to use xor for my double numbers in matlab,but bitxor is only working for int numbers. Is there a function that could convert double to int in Matlab?
The functions You are looking for might be: int8(number), int16(number), uint32(number) Any of them will convert Double to an Integer, but You must pick the best one for the result You want to achieve. Remember that You cannot cast from Double to Integer without rounding the number.
If I understood You correcly, You could create a function that would simply remove the "comma" from the Double number by multiplying your starting value by 2^n and then casting it to Integer using any of the functions mentioned earlier, performing whatever you want and then returning comma to its original position by dividing the number by 2^n
Multiplying the starting value by 2^n is a hack that will decrease the rounding error.
The perfect value for n would be the number of digits after the comma if this number is relatively small.
Please also specify, why are You trying to do this? This doesn't seem to be the optimal solution.
You can just cast to an integer:
a = 1.003
int8(a)
ans =
1
That gives you an 8 bit signed integer, you can also get other size i.e. int16 or else unsigned i.e. uint8 depending on what you want to do

fortran90 reading array with real numbers

I have a list of real data in a file. The real data looks like this..
25.935
25.550
24.274
29.936
23.122
27.360
28.154
24.320
28.613
27.601
29.948
29.367
I write fortran90 code to read this data into an array as below:
PROGRAM autocorr
implicit none
INTEGER, PARAMETER :: TRUN=4000,TCOR=1800
real,dimension(TRUN) :: angle
real :: temp, temp2, average1, average2
integer :: i, j, p, q, k, count1, t, count2
REAL, DIMENSION(0:TCOR) :: ACF
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
open(100, file="fort.64",status="old")
do k = 1,TRUN
read(100,*) angle(k)
end do
Then, when I print again to see the values, I get
25.934999
25.549999
24.274000
29.936001
23.122000
27.360001
28.153999
24.320000
28.613001
27.601000
29.948000
29.367001
32.122002
33.818001
21.837000
29.283001
26.489000
24.010000
27.698000
30.799999
36.157001
29.034000
34.700001
26.058001
29.114000
24.177000
25.209000
25.820999
26.620001
29.761000
May I know why the values are now 6 decimal points?
How to avoid this effect so that it doesn't affect the calculation results?
Appreciate any help.
Thanks
You don't show the statement you use to write the values out again. I suspect, therefore, that you've used Fortran's list-directed output, something like this
write(output_unit,*) angle(k)
If you have done this you have surrendered the control of how many digits the program displays to the compiler. That's what the use of * in place of an explicit format means, the standard says that the compiler can use any reasonable representation of the number.
What you are seeing, therefore, is your numbers displayed with 8 sf which is about what single-precision floating-point numbers provide. If you wanted to display the numbers with only 3 digits after the decimal point you could write
write(output_unit,'(f8.3)') angle(k)
or some variation thereof.
You've declared angle to be of type real; unless you've overwritten the default with a compiler flag, this means that you are using single-precision IEEE754 floating-point numbers (on anything other than an exotic computer). Bear in mind too that most real (in the mathematical sense) numbers do not have an exact representation in floating-point and that the single-precision decimal approximation to the exact number 25.935 is likely to be 25.934999; the other numbers you print seem to be the floating-point approximations to the numbers your program reads.
If you really want to compute your results with a lower precision, then you are going to have to employ some clever programming techniques.

Making a calculation in objective c

I need a variable a = 6700000^2 * (a - b) (2 + sinf(a)+ s inf(b)), where a and b are floats between -7 to 7. I need all the precision that floats can give me.
Which data type should a be? Is the sinf the proper function to get the best precision out of a and b? And should a and b be in radians or degrees?
Well I Made a mistake when I posted the expression, the correct expression is c=67000000^2*(a-b)(2+sinf(a)+sinf(b)) and my problem is with c ."a" and "b" are floats and they are passed to me as floats, they really are coordinates (latitude and longitude) so thats not my concern... My concern is when using sinf on them do I lose any precision? And which type should c be so I don't lose precision cause I'm using a long double variable d to store a sum of multiple different c variables and d is returned to me as being zero and it shouldn't (sould be about 1 or 2 )so I was gessing I was losing some precision when calculating the c parcels...I was using c as being a double...can it be that I am losing some precision when calculating c?
Thank you very much for your help.
I can't tell you whether float is good enough for your application. If you need more precision, use double, and then use sin() instead of sinf().
The standard trig functions take angles in radians, as you'll discover if you read the relevant documentation.
Instead of using float, you should use a double if you want no worries in regards to memory. Remember to then change sinf() to sin() and use radians.
If you want the best precision without rolling your own types, you should use double rather than float. In that case, you can just use sin(3). According to the man page, you should pass the argument in radians.