convert integer to long double

convert integer to long double - fortran90

I have to create a long double random generator. I am thinking about a linear congruential generator, because I don't need high precision random sequence. But how can I convert an integer to a long double?

The conversion is implied when assigned. (Tested with gcc):
program test
integer i
real*10 d
i = 10000
i = i * 101
d = i
write (*,*) 'result is', d
end
... amended ...
I was concerned there was something about the extended types which don't work well, I created a more thorough test:
program test
integer i
real f
double precision d
real*10 ld
i = 100000
i = i * 101 + 101
f = i * 1.1
d = f * 1.1
ld = d * 1.1
write (*,*) 'result is', i, f, d, ld
end
[wally#zenetfedora ~]$ ./a.out
result is 10100101 11110111. 12221122.364885688 13443234.892748519537

Related

How to convert 1D matlab array code into Fortran code and to get the values

I want to convert this Matlab code into Fortran code. I have provided the codes here for both Matlab and Fortran. The parameters are also given here.
Matlab code
L_10 = 1.0e-10;
e = 0.4;
n = 100000;
R = 3.1e+5;
K0_10 = 1.0e-10;
Ci = 1.0e-15;
zv = 1.2;
Dv = 1
Rho = 2.0e-4
dt = 0.01
for i=1:n
L_10(i+1) = L_10(i) + dt*(e*K0_10- R*L_10(i)*Ci- zv*Dv*L_10(i)*Rho);
end
I have written the following Fortran code but it does not work
real, dimension (:), allocatable:: L_10
real, parameter :: e = 0.4
integer, parameter :: n = 100000
real, parameter :: R = 3.1e+5
real, parameter :: K0_10 = 1.0e-10
real, parameter :: Ci = 1.0e-15
real, parameter :: zv = 1.2
real, parameter :: Dv = 1.0
real, parameter :: Rho = 2.0e-4
real, parameter :: dt = 0.01
integer:: i
do i=1:n
L_10(i+1) = L_10(i) + dt*(e*K0_10- R*L_10(i)*Ci- zv*Dv*L_10(i)*Rho);
end
How to initialize the array value in the Fortran code? How the iteration will work in Fortran? It works perfectly well in Matlab.

This program in Fortran is producing the same output as in Matlab. The iteration of array is different in do loop in Fortran as shown below. The iteration is saved in Fortran without indexing but in Matlab, it is done with indexed as shown in Matlab code.
program oneDimention
implicit none
integer, parameter :: n = 10
real, dimension (n):: L_10
real, parameter :: e = 0.4
real, parameter :: R = 3.1e+5
real, parameter :: K0_10 = 1.0e-10
real, parameter :: Ci = 1.0e-15
real, parameter :: zv = 1.2
real, parameter :: Dv = 1.0
real, parameter :: Rho = 2.0e-4
real, parameter :: dt = 0.01
integer:: i
L_10 = 1.0e-10
do i=1,n
L_10 = L_10(i) + dt*(e*K0_10- R*L_10(i)*Ci- zv*Dv*L_10(i)*Rho);
print*,L_10(i)
end do
end program oneDimention

I can immediately see two problems with the Fortran. You've chosen to make L_10 allocatable, but the code doesn't allocate it. You could either make it static, by changing its declaration to
real, dimension (n+1) :: L_10
If you choose this approach you'll have to move the declaration until after the declaration of n itself, the compiler won't work with forward declarations. The alternative would be to leave the declaration as it is but to insert the statement
allocate(L_10(n+1))
after the declarations but before you try to make first use of the array. Review your documentation for the allocate statement and learn how to get a status code reported in case things go awry.
You can then set the value of all elements in the same way you do in Matlab,
L_10 = 1.0e-10
You have the syntax for the do statement wrong, it should start with the line
do i = 1, n
with a comma where Matlab uses the colon.
There may be other problems in the Fortran which I haven't spotted, but your compiler will help you there.

Fixed point approximation of 2^x, with input range of s5.26

How can I implement 2^x fixed-point arithmetic s5.26 and input values is in range [-31.9, 31.9] using the minimax polynomial approximation for exp2()
How to generate the polynomial using Sollya Tool mentioned in the following link
Power of 2 approximation in fixed point

Since fixed-point arithmetic generally does not include an "infinity" encoding representing overflowed results, any implementation of exp2() for an s5.26 format will be limited to inputs in the interval (-32, 5), resulting in outputs in [0, 32).
The computation of transcendental functions typically consist of argument reduction, core approximation, final result construction. In the case of exp2(a), a reasonable argument reduction scheme is to split a into integer part i and fractional part f, such that a == i + f, with f in [-0.5, 0.5]. One then computes exp2(f), and scales the result by 2i, which corresponds to shifts in fixed-point arithmetic: exp2(a) = exp2(f) * exp2(i).
The common design choices for the computation of exp2(f) are interpolation in tabulated values of exp2(), or polynomial approximation. Since we need 31 result bits for the largest arguments, accurate interpolation would probably want to use quadratic interpolation to keep the table size reasonable. Since many modern processors (including ones used in embedded systems) provide a fast integer multiplier, I will focus here on approximation by polynomial. For this, we want a polynomial with minimax properties, that is, one that minimizes the maximum error compared to the reference.
Both commercial and free tools offer built-in capabilities to generate minimax approximations, e.g. Mathematica's MiniMaxApproximation command, Maple's minimax command, and Sollya's fpminimax command. One might also chose to build one's own infrastructure based on the Remez algorithm, which is the approach I have used. As opposed to floating-point arithmetic which typically uses to-nearest-or-even rounding, fixed-point arithmetic is usually restricted to truncation of intermediate results. This adds additional error during expression evaluation. As a consequence, it is usually a good idea to try a heuristic-based search for small adjustments to the coefficients of the generated approximation to partially balance those accumulating one-sided errors.
Because we need up to 31 bits in the result, and because coefficients in core approximations are typically less than unity in magnitude, we cannot use the native fixed-point precision, here s5.26, for polynomial evaluation. Instead, we want to scale up the operands in intermediate computation to fully use the available range of 32-bit integers, by dynamically adjusting the fixed-point format we are working in. For reasons of efficiency, it seems advisable to arrange the computation such that multiplications use re-normalization right shifts by 32 bits. This will often allow the elimination of explicit shifts on 32-bit processors.
Since intermediate computation uses signed data, right shifts of signed, negative operands will occur. We want those right shifts to map to arithmetic right shift instructions, something the C standard does not guarantee. But on most commonly used platforms, C compilers do what is desirable for us. Otherwise, it may be necessary to resort to intrinsics or inline assembly. I developed the code below with the Microsoft compiler on an x64 platform.
In the evaluation of the polynomial approximation for exp2(f) the original floating-point coefficients, the dynamic scaling, and the heuristic adjustments are all clearly visible. The code below does not quite achieve full accuracy for large arguments. The biggest absolute error is 1.10233e-7, for the argument of 0x12de9c5b = 4.71739332: fixed_exp2() returns 0x693ab6a3 while the accurate result would be 0x693ab69c. Presumably full accuracy could be achieved by increasing the degree of the polynomial core approximation by one.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>
/* on 32-bit architectures, there is often an instruction/intrinsic for this */
int32_t mulhi (int32_t a, int32_t b)
{
return (int32_t)(((int64_t)a * (int64_t)b) >> 32);
}
/* compute exp2(a) in s5.26 fixed-point arithmetic */
int32_t fixed_exp2 (int32_t a)
{
int32_t i, f, r, s;
/* split a = i + f, such that f in [-0.5, 0.5] */
i = (a + 0x2000000) & ~0x3ffffff; // 0.5
f = a - i;
s = ((5 << 26) - i) >> 26;
f = f << 5; /* scale up for maximum accuracy in intermediate computation */
/* approximate exp2(f)-1 for f in [-0.5, 0.5] */
r = (int32_t)(1.53303146e-4 * (1LL << 36) + 996);
r = mulhi (r, f) + (int32_t)(1.33887795e-3 * (1LL << 35) + 99);
r = mulhi (r, f) + (int32_t)(9.61833261e-3 * (1LL << 34) + 121);
r = mulhi (r, f) + (int32_t)(5.55036329e-2 * (1LL << 33) + 51);
r = mulhi (r, f) + (int32_t)(2.40226507e-1 * (1LL << 32) + 8);
r = mulhi (r, f) + (int32_t)(6.93147182e-1 * (1LL << 31) + 5);
r = mulhi (r, f);
/* add 1, scale based on integral portion of argument, round the result */
r = ((((uint32_t)r * 2) + (uint32_t)(1.0*(1LL << 31)) + ((1U << s) / 2) + 1) >> s);
/* when argument < -26.5, result underflows to zero */
if (a < -0x6a000000) r = 0;
return r;
}
/* convert from s5.26 fixed point to double-precision floating point */
double fixed_to_float (int32_t a)
{
return a / 67108864.0;
}
int main (void)
{
double a, res, ref, err, maxerr = 0.0;
int32_t x, start, end;
start = -0x7fffffff; // -31.999999985
end = 0x14000000; // 5.000000000
printf ("testing fixed_exp2 with inputs in [%.9f, %.9f)\n",
fixed_to_float (start), fixed_to_float (end));
for (x = start; x < end; x++) {
a = fixed_to_float (x);
ref = exp2 (a);
res = fixed_to_float (fixed_exp2 (x));
err = fabs (res - ref);
if (err > maxerr) {
maxerr = err;
}
}
printf ("max. abs. err = %g\n", maxerr);
return EXIT_SUCCESS;
}
A table-based alternative would trade-off table storage for a reduction in the amount of computation that is performed. Depending on the size of the L1 data cache, this may or may not increase performance. One possible approach is to tabulate 2f-1 for f in [0, 1). The split the function argument into an integer i and a fraction f, such that f in [0, 1). In order to keep the table reasonably small, use quadratic interpolation, with the coefficients of the polynomial computed on the fly from three consecutive table entries. The result is slightly adjusted by a heuristically determined offset to somewhat compensate for the truncating nature of fixed-point arithmetic.
The table is indexed by leading bits of the fraction f. Using seven bits for the index (resulting in a table of 128+2 entries), accuracy is slightly worse than with the previous minimax polynomial approximation. Maximum absolute error is 1.74935e-7. It occurs for an argument of 0x11580000 = 4.33593750, where fixed_exp2() returns 0x50c7d771, whereas the accurate result would be 0x50c7d765.
/* For i in [0,129]: (exp2 (i/128.0) - 1.0) * (1 << 31) */
static const uint32_t expTab [130] =
{
0x00000000, 0x00b1ed50, 0x0164d1f4, 0x0218af43,
0x02cd8699, 0x0383594f, 0x043a28c4, 0x04f1f656,
0x05aac368, 0x0664915c, 0x071f6197, 0x07db3580,
0x08980e81, 0x0955ee03, 0x0a14d575, 0x0ad4c645,
0x0b95c1e4, 0x0c57c9c4, 0x0d1adf5b, 0x0ddf0420,
0x0ea4398b, 0x0f6a8118, 0x1031dc43, 0x10fa4c8c,
0x11c3d374, 0x128e727e, 0x135a2b2f, 0x1426ff10,
0x14f4efa9, 0x15c3fe87, 0x16942d37, 0x17657d4a,
0x1837f052, 0x190b87e2, 0x19e04593, 0x1ab62afd,
0x1b8d39ba, 0x1c657368, 0x1d3ed9a7, 0x1e196e19,
0x1ef53261, 0x1fd22825, 0x20b05110, 0x218faecb,
0x22704303, 0x23520f69, 0x243515ae, 0x25195787,
0x25fed6aa, 0x26e594d0, 0x27cd93b5, 0x28b6d516,
0x29a15ab5, 0x2a8d2653, 0x2b7a39b6, 0x2c6896a5,
0x2d583eea, 0x2e493453, 0x2f3b78ad, 0x302f0dcc,
0x3123f582, 0x321a31a6, 0x3311c413, 0x340aaea2,
0x3504f334, 0x360093a8, 0x36fd91e3, 0x37fbefcb,
0x38fbaf47, 0x39fcd245, 0x3aff5ab2, 0x3c034a7f,
0x3d08a39f, 0x3e0f680a, 0x3f1799b6, 0x40213aa2,
0x412c4cca, 0x4238d231, 0x4346ccda, 0x44563ecc,
0x45672a11, 0x467990b6, 0x478d74c9, 0x48a2d85d,
0x49b9bd86, 0x4ad2265e, 0x4bec14ff, 0x4d078b86,
0x4e248c15, 0x4f4318cf, 0x506333db, 0x5184df62,
0x52a81d92, 0x53ccf09a, 0x54f35aac, 0x561b5dff,
0x5744fccb, 0x5870394c, 0x599d15c2, 0x5acb946f,
0x5bfbb798, 0x5d2d8185, 0x5e60f482, 0x5f9612df,
0x60ccdeec, 0x62055b00, 0x633f8973, 0x647b6ca0,
0x65b906e7, 0x66f85aab, 0x68396a50, 0x697c3840,
0x6ac0c6e8, 0x6c0718b6, 0x6d4f301f, 0x6e990f98,
0x6fe4b99c, 0x713230a8, 0x7281773c, 0x73d28fde,
0x75257d15, 0x767a416c, 0x77d0df73, 0x792959bb,
0x7a83b2db, 0x7bdfed6d, 0x7d3e0c0d, 0x7e9e115c,
0x80000000, 0x8163daa0
};
int32_t fixed_exp2 (int32_t x)
{
int32_t f1, f2, dx, a, b, approx, idx, i, f;
/* extract integer portion; 2**i is realized as a shift at the end */
i = (x >> 26);
/* extract fraction f so we can compute 2^f, 0 <= f < 1 */
f = x & 0x3ffffff;
/* index table of exp2 values using 7 most significant bits of fraction */
idx = (uint32_t)f >> (26 - 7);
/* difference between argument and next smaller sampling point */
dx = f - (idx << (26 - 7));
/* fit parabola through closest 3 sampling points; find coefficients a,b */
f1 = (expTab[idx+1] - expTab[idx]);
f2 = (expTab[idx+2] - expTab[idx]);
a = f2 - (f1 << 1);
b = (f1 << 1) - a;
/* find function value offset for argument x by computing ((a*dx+b)*dx) */
approx = a;
approx = (int32_t)((((int64_t)approx)*dx) >> (26 - 7)) + b;
approx = (int32_t)((((int64_t)approx)*dx) >> (26 - 7 + 1));
/* combine integer and fractional parts of result, round result */
approx = (((expTab[idx] + (uint32_t)approx + (uint32_t)(1.0*(1LL << 31)) + 22U) >> (30 - 26 - i)) + 1) >> 1;
/* flush underflow to 0 */
if (i < -27) approx = 0;
return approx;
}

Mean of structure's line

I have a structure like
struct =
Fields Subject1 Subject2 Subject3 Subject4
1 30000x1 double 30000x1 double 30000x1 double 30000x1 double
2 30000x1 double 30000x1 double 30000x1 double 30000x1 double
3 30000x1 double 30000x1 double 30000x1 double 30000x1 double
4 30000x1 double 30000x1 double 30000x1 double 30000x1 double
where 1,2,3 and 4 are conditions
I would like to calculate the mean for each condition, so for each LINE of the struture.
I tried with :
for i = 1:length(struct)
mean_condition(i) = mean([strut(i)]);
end
but I obtain this error
Error using sum
Invalid data type. First argument must be numeric or logical.
Error in mean (line 117)
y = sum(x, dim, flag)/size(x,dim);
How can I fix it ?

While structfun allows you to perform an operation over the fields of a structure, it only works with scalar arrays. Because you have a structure array, you'll need to use an explicit loop or an implicit arrayfun loop.
As an example of the latter:
condition(1).subject1 = 1:10;
condition(1).subject2 = 1:20;
condition(2).subject1 = 1:30;
condition(2).subject2 = 1:40;
results = arrayfun(#(x)mean(structfun(#mean, x)), condition).';
Which gives us:
results =
8
18
Which we can verify with:
>> [mean([mean(condition(1).subject1), mean(condition(1).subject2)]); mean([mean(condition(2).subject1), mean(condition(2).subject2)])]
ans =
8
18
Depending on MATLAB version, the *fun functions may be slower than the explicit loop due to additional function call overhead. This is certainly the case with older versions of MATLAB, but engine improvements have started to bring parity to their performance.
For completeness sake, an explicit loop version:
results = zeros(numel(condition, 1));
for ii = 1:numel(condition)
tmpnames = fieldnames(condition(ii));
tmpmeans = zeros(numel(tmpnames, 1));
for jj = 1:numel(tmpnames)
tmpmeans(jj) = mean(condition(ii).(tmpnames{jj}));
end
results(ii) = mean(tmpmeans);
end

Since the fields of all the structs in the array have the same size, you can perform this computation very easily as follows:
s = struct();
s_len = 4;
for i = 1:s_len
s(i).Subject1 = repmat(i,30,1);
s(i).Subject2 = repmat(i,30,1);
s(i).Subject3 = repmat(i,30,1);
s(i).Subject4 = repmat(i,30,1);
end
m = reshape(mean(cell2mat(struct2cell(s))),s_len,1);
The variable m is then a row vector of double values in which each row contains the mean of the respective condition:
m =
1 % mean of condition 1
2 % mean of condition 2
3 % mean of condition 3
4 % mean of condition 4

inner product in minizinc

Ιn two Vectors V1(x11, x12) και V2(x21,x22) we can compute their inner product as V1 • V2.= (x11* x21 + x12 * x22 ).
I try to compute minimum inner product as (x1ix2j|i-j|, i.j the places of coordinates at V1, V2.
Every cooedinate is used once in a sum condition.
I TRIED THIS:
int : vlen;
set of int : LEN = 1..vlen;
set of int : VECS = 1..2;
array[VECS,LEN] of -25..25 : vector;
var -600..700 : sumTotal;
constraint exists(i,j,k,l in LEN where i!=k \/ j!=l)(
exists(v,v2 in VECS)(sumTotal=(vector[v,i] * vector[v2,j] * abs(i-j)+vector[v,k] * vector[v2,l] * abs(k-l)
)));
solve minimize sumTotal;
output ["vector1=["]++[" \(vector[1,j])"|j in LEN]++[" ];\nvector2=["]++[" \(vector[2,j])"|j in LEN]++[" ];\nsumTotal=\(sumTotal);"]
for
vlen = 2;
vector = [|-2,3|-4,5|];
i expect:
vector1 = [-2, 3];
vector2 = [-4, 5];
sumTotal = -22;
----------
==========
but i take:
vector1=[ -2 3 ];
vector2=[ -4 5 ];
sumTotal=-40;
----------
==========

I'm afraid I don't understand the meaning of your model, but it does contain some errors in the constraint that should be easy to fix:
If an array is indexed by VEC, LEN, then the second index should always be part of that set.
sum is it's own looping structure; it doesn't need an forall expression.
The resulting constraint would be:
constraint sumTotal = sum(i,j in LEN)(
vector[1,i] * vector[2,j] * abs(i,j)
);
This still leaves a rather strange model, so you might want to take a look at the following:
sumTotal is your only variable, but it's defined by parameters. It cannot be optimised as it only has 1 solution.
Should i and j be able to take the same value? If not, then you should use i,j in LEN where i < j.
Do you expect any results other than sumTotal?

What does colon mean in fortran?

I'm trying to convert fortran code to matlab, I was wondering if someone could help me with this subroutine.
I'm specifically asking what does the colon mean in these lines?
SUB Taper (a(), co(), Re(), Im())
FOR nd = 0 TO 31
n1 = 8 * nd: n2 = a(n1 + 4): n1 = a(n1): n0 = 255 - nd
a = .5 * (1 - co(n1)): b = .5 * (1 - co(n2))
Re(nd) = a * Re(nd): Im(nd) = b * Im(nd)
Re(n0) = b * Re(n0): Im(n0) = a * Im(n0)
NEXT
END SUB

The code fragment in your question has not a valid Fortran syntax. It is VB and colon is used as statement separator

Fortran90 and up allow you to access a single array value given an index, and access a subarray given a range of indices separated by a colon.
Fortran = Beginning : End : Increment
MatLab = Beginning : Increment : End
There is a table at the bottom of page 5 in this doc that shows the Fortran and MatLab equivalents.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

convert integer to long double - fortran90

I have to create a long double random generator. I am thinking about a linear congruential generator, because I don't need high precision random sequence. But how can I convert an integer to a long double?

Related

How to convert 1D matlab array code into Fortran code and to get the values

Fixed point approximation of 2^x, with input range of s5.26

Mean of structure's line

inner product in minizinc

What does colon mean in fortran?

Categories

Resources