Progress 4GL/Openedge Moving Integer using OCX - progress-4gl

How to create a module with OCX that makes the FILL-IN (INT) move like a lottery. i tried searching the net on how to do it but no luck.

You are over-complicating it. There is no need for an OCX or multiple fill-ins:
define variable r as integer no-undo.
define variable z as integer no-undo format "999999".
define variable k as integer no-undo.
do k = 6 to 1 by -1:
etime( true ).
do while etime < 1000: /* spend 1000ms on each random digit */
r = random( 0, exp( 10, k ) - 1 ).
display ( z + r ) # z.
end.
z = z + ( random( 0, 9 ) * exp( 10, k - 1 )).
display z.
/* pause. */ /* handy for debugging... */
end.

Related

Implementing Montogomery Modular Reduction/Multiplication (MMM)

I have been trying to implement Montogomery Modular Reduction in Verilog and encountered an error while doing so. Attaching the code below-
module MMM ( a , b , c , y ) ;
// Parameters
//
parameter N = 32 ; // Default value of N
// Inputs
//
input [N-1:0] a ; // N-bit input a
input [N-1:0] b ; // N-bit input b
input [N-1:0] c ; // N-bit input c
// Outputs
//
output [N-1:0] y ; // N-bit output y
// Internal nets
//
wire [N-1:0] q ; // N-bit q array
//wire [N+1:0] t [0:N-1] ; // (N+2)-bit temporary iteration variable t, bus array of N
wire [N+1:0] s ; //
// Initial value of S
//
assign s[0] = 0 ;
// Iteration
//
genvar i ;
generate
for ( i = 0 ; i <= N-1 ; i = i + 1 )
begin : iterate
assign q[i] = (s[i] + a[i] * b) % 2;
assign s[i+1] = (s[i] + q[i] * c + a[i] * b) / 2;
if (s[N] >= c)
assign y = s[N] - c ;
else
assign y = s[N] ;
end // iterate
endgenerate
//assign MMM[a, b, c] = y;
endmodule
The error- The generate if condition must be a constant expression.
Any help would be great.
Thanks
The problem is that if inside a generate must be decidable at compilation time, you are using a signal as a condition of the if block. I understand that what you mean is to create a mux with that expression in the selector, but the compiler don't.
you could wrap the iterate logic in a procedural always block and then you would be able to use the if statement.
Also, the assignment to y should be outside the iterate block, otherwise it will have multiple drivers.
Fixing these two problems you have
module MMM ( a , b , c , y ) ;
// Parameters
//
parameter N = 32 ; // Default value of N
// Inputs
//
input [N-1:0] a ; // N-bit input a
input [N-1:0] b ; // N-bit input b
input [N-1:0] c ; // N-bit input c
// Outputs
//
output [N-1:0] y ; // N-bit output y
// Internal nets
//
wire [N-1:0] q ; // N-bit q array
//wire [N+1:0] t [0:N-1] ; // (N+2)-bit temporary iteration variable t, bus array of N
wire [N+1:0] s ; //
// Initial value of S
//
assign s[0] = 0 ;
// Iteration
//
genvar i ;
generate
for ( i = 0 ; i <= N-1 ; i = i + 1 )
begin : iterate
assign q[i] = (s[i] + a[i] * b) % 2;
assign s[i+1] = (s[i] + q[i] * c + a[i] * b) / 2;
end // iterate
endgenerate
assign y = s[N] >= c ? s[N] - c : s[N];
//assign MMM[a, b, c] = y;
endmodule
Disclaimer: Maybe there are more errors, I did not notice.

Speed in Matlab vs. Julia vs. Fortran

I am playing around with different languages to solve a simple value function iteration problem where I loop over a state-space grid. I am trying to understand the performance differences and how I could tweak each code. For posterity I have posted full length working examples for each language below. However, I believe that most of the tweaking is to be done in the while loop. I am a bit confused what I am doing wrong in Fortran as the speed seems subpar.
Matlab ~2.7secs : I am avoiding a more efficient solution using the repmat function for now to keep the codes comparable. Code seems to be automatically multithreaded onto 4 threads
beta = 0.98;
sigma = 0.5;
R = 1/beta;
a_grid = linspace(0,100,1001);
tic
[V_mat, next_mat] = valfun(beta, sigma, R ,a_grid);
toc
where valfun()
function [V_mat, next_mat] = valfun(beta, sigma, R, a_grid)
zeta = 1-1/sigma;
len = length(a_grid);
V_mat = zeros(2,len);
next_mat = zeros(2,len);
u = zeros(2,len,len);
c = zeros(2,len,len);
for i = 1:len
c(1,:,i) = a_grid(i) - a_grid/R + 20.0;
c(2,:,i) = a_grid(i) - a_grid/R;
end
u = c.^zeta * zeta^(-1);
u(c<=0) = -1e8;
tol = 1e-4;
outeriter = 0;
diff = 1000.0;
while (diff>tol) %&& (outeriter<20000)
outeriter = outeriter + 1;
V_last = V_mat;
for i = 1:len
[V_mat(1,i), next_mat(1,i)] = max( u(1,:,i) + beta*V_last(2,:));
[V_mat(2,i), next_mat(2,i)] = max( u(2,:,i) + beta*V_last(1,:));
end
diff = max(abs(V_mat - V_last));
end
fprintf("\n Value Function converged in %i steps. \n", outeriter)
end
Julia (after compilation) ~5.4secs (4 threads (9425469 allocations: 22.43 GiB)), ~7.8secs (1 thread (2912564 allocations: 22.29 GiB))
[EDIT: after adding correct broadcasting and #views its only 1.8-2.1seconds now, see below!]
using LinearAlgebra, UnPack, BenchmarkTools
struct paramsnew
β::Float64
σ::Float64
R::Float64
end
function valfun(params, a_grid)
#unpack β,σ, R = params
ζ = 1-1/σ
len = length(a_grid)
V_mat = zeros(2,len)
next_mat = zeros(2,len)
u = zeros(2,len,len)
c = zeros(2,len,len)
#inbounds for i in 1:len
c[1,:,i] = #. a_grid[i] - a_grid/R .+ 20.0
c[2,:,i] = #. a_grid[i] - a_grid/R
end
u = c.^ζ * ζ^(-1)
u[c.<=0] .= typemin(Float64)
tol = 1e-4
outeriter = 0
test = 1000.0
while test>tol
outeriter += 1
V_last = deepcopy(V_mat)
#inbounds Threads.#threads for i in 1:len # loop over grid points
V_mat[1,i], next_mat[1,i] = findmax( u[1,:,i] .+ β*V_last[2,:])
V_mat[2,i], next_mat[2,i] = findmax( u[2,:,i] .+ β*V_last[1,:])
end
test = maximum( abs.(V_mat - V_last)[.!isnan.( V_mat - V_last )])
end
print("\n Value Function converged in ", outeriter, " steps.")
return V_mat, next_mat
end
a_grid = collect(0:0.1:100)
p1 = paramsnew(0.98, 1/2, 1/0.98);
#time valfun(p1,a_grid)
print("\n should be compiled now \n")
#btime valfun(p1,a_grid)
Fortran (O3, mkl, qopenmp) ~9.2secs: I also must be doing something wrong when declaring the openmp variables as the compilation will crash for some grid sizes when using openmp (SIGSEGV error).
module mod_calc
use omp_lib
implicit none
integer, parameter :: dp = selected_real_kind(33,4931), len = 1001
public :: dp, len
contains
subroutine linspace(from, to, array)
real(dp), intent(in) :: from, to
real(dp), intent(out) :: array(:)
real(dp) :: range
integer :: n, i
n = size(array)
range = to - from
if (n == 0) return
if (n == 1) then
array(1) = from
return
end if
do i=1, n
array(i) = from + range * (i - 1) / (n - 1)
end do
end subroutine
subroutine calc_val()
real(dp):: bbeta, sigma, R, zeta, tol, test
real(dp):: a_grid(len), V_mat(2,len), V_last(2,len), &
u(len,len,2), c(len,len,2)
integer :: outeriter, i, sss, next_mat(2,len), fu
character(len=*), parameter :: FILE_NAME = 'data.txt' ! File name.
call linspace(from=0._dp, to=100._dp, array=a_grid)
bbeta = 0.98
sigma = 0.5
R = 1.0/0.98
zeta = 1.0 - 1.0/sigma
tol = 1e-4
test = 1000.0
outeriter = 0
do i = 1,len
c(:,i,1) = a_grid(i) - a_grid/R + 20.0
c(:,i,2) = a_grid(i) - a_grid/R
end do
u = c**zeta * 1.0/zeta
where (c<=0)
u = -1e6
end where
V_mat = 0.0
next_mat = 0.0
do while (test>tol .and. outeriter<20000)
outeriter = outeriter+1
V_last = V_mat
!$OMP PARALLEL DEFAULT(NONE) &
!$OMP SHARED(V_mat, next_mat,V_last, u, bbeta) &
!$OMP PRIVATE(i)
!$OMP DO SCHEDULE(static)
do i=1,len
V_mat(1,i) = maxval(u(:,i,1) + bbeta*V_last(2,:))
next_mat(1,i) = maxloc(u(:,i,1) + bbeta*V_last(2,:),1)
V_mat(2,i) = maxval(u(:,i,2) + bbeta*V_last(1,:))
next_mat(2,i) = maxloc(u(:,i,2) + bbeta*V_last(1,:),1)
end do
!$OMP END DO
!$OMP END PARALLEL
test = maxval(abs(log(V_last/V_mat)))
end do
end subroutine
end module mod_calc
program main
use mod_calc
implicit none
integer:: clck_counts_beg,clck_rate,clck_counts_end
call omp_set_num_threads(4)
call system_clock ( clck_counts_beg, clck_rate )
call calc_val()
call system_clock ( clck_counts_end, clck_rate )
write (*, '("Time = ",f6.3," seconds.")') (clck_counts_end - clck_counts_beg) / real(clck_rate)
end program main
There should be ways to reduce the amount of allocations (Julia reports 32-45% gc time!) but for now I am too novice to see them, so any comments and tipps are welcome.
Edit:
Adding #views and correct broadcasting to the while loop improved the Julia speed considerably (as expected, I guess) and hence beats the Matlab loop now. With 4 threads the code now takes only 1.97secs. Specifically,
#inbounds for i in 1:len
c[1,:,i] = #views #. a_grid[i] - a_grid/R .+ 20.0
c[2,:,i] = #views #. a_grid[i] - a_grid/R
end
u = #. c^ζ * ζ^(-1)
#. u[c<=0] = typemin(Float64)
while test>tol && outeriter<20000
outeriter += 1
V_last = deepcopy(V_mat)
#inbounds Threads.#threads for i in 1:len # loop over grid points
V_mat[1,i], next_mat[1,i] = #views findmax( #. u[1,:,i] + β*V_last[2,:])
V_mat[2,i], next_mat[2,i] = #views findmax( #. u[2,:,i] + β*V_last[1,:])
end
test = #views maximum( #. abs(V_mat - V_last)[!isnan( V_mat - V_last )])
end
The reason the fortran is so slow is that it is using quadruple precision - I don't know Julia or Matlab but it looks as though double precision is being used in that case. Further as noted in the comments some of the loop orders are incorrect for Fortran, and also you are not consistent in your use of precision in the Fortran code, most of your constants are single precision. Correcting all these leads to the following:
Original: test = 9.83440674663232047922921588613472439E-0005 Time =
31.413 seconds.
Optimised: test = 9.8343643237979391E-005 Time = 0.912 seconds.
Note I have turned off parallelisation for these, all results are single threaded. Code is below:
module mod_calc
!!$ use omp_lib
implicit none
!!$ integer, parameter :: dp = selected_real_kind(33,4931), len = 1001
integer, parameter :: dp = selected_real_kind(15), len = 1001
public :: dp, len
contains
subroutine linspace(from, to, array)
real(dp), intent(in) :: from, to
real(dp), intent(out) :: array(:)
real(dp) :: range
integer :: n, i
n = size(array)
range = to - from
if (n == 0) return
if (n == 1) then
array(1) = from
return
end if
do i=1, n
array(i) = from + range * (i - 1) / (n - 1)
end do
end subroutine
subroutine calc_val()
real(dp):: bbeta, sigma, R, zeta, tol, test
real(dp):: a_grid(len), V_mat(len,2), V_last(len,2), &
u(len,len,2), c(len,len,2)
integer :: outeriter, i, sss, next_mat(2,len), fu
character(len=*), parameter :: FILE_NAME = 'data.txt' ! File name.
call linspace(from=0._dp, to=100._dp, array=a_grid)
bbeta = 0.98_dp
sigma = 0.5_dp
R = 1.0_dp/0.98_dp
zeta = 1.0_dp - 1.0_dp/sigma
tol = 1e-4_dp
test = 1000.0_dp
outeriter = 0
do i = 1,len
c(:,i,1) = a_grid(i) - a_grid/R + 20.0_dp
c(:,i,2) = a_grid(i) - a_grid/R
end do
u = c**zeta * 1.0_dp/zeta
where (c<=0)
u = -1e6_dp
end where
V_mat = 0.0_dp
next_mat = 0.0_dp
do while (test>tol .and. outeriter<20000)
outeriter = outeriter+1
V_last = V_mat
!$OMP PARALLEL DEFAULT(NONE) &
!$OMP SHARED(V_mat, next_mat,V_last, u, bbeta) &
!$OMP PRIVATE(i)
!$OMP DO SCHEDULE(static)
do i=1,len
V_mat(i,1) = maxval(u(:,i,1) + bbeta*V_last(:, 2))
next_mat(i,1) = maxloc(u(:,i,1) + bbeta*V_last(:, 2),1)
V_mat(i,2) = maxval(u(:,i,2) + bbeta*V_last(:, 1))
next_mat(i,2) = maxloc(u(:,i,2) + bbeta*V_last(:, 1),1)
end do
!$OMP END DO
!$OMP END PARALLEL
test = maxval(abs(log(V_last/V_mat)))
end do
Write( *, * ) test
end subroutine
end module mod_calc
program main
use mod_calc
implicit none
integer:: clck_counts_beg,clck_rate,clck_counts_end
!!$ call omp_set_num_threads(2)
call system_clock ( clck_counts_beg, clck_rate )
call calc_val()
call system_clock ( clck_counts_end, clck_rate )
write (*, '("Time = ",f6.3," seconds.")') (clck_counts_end - clck_counts_beg) / real(clck_rate)
end program main
Compilation / linking:
ian#eris:~/work/stack$ gfortran --version
GNU Fortran (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ian#eris:~/work/stack$ gfortran -Wall -Wextra -O3 jul.f90
jul.f90:36:48:
character(len=*), parameter :: FILE_NAME = 'data.txt' ! File name.
1
Warning: Unused parameter ‘file_name’ declared at (1) [-Wunused-parameter]
jul.f90:35:57:
integer :: outeriter, i, sss, next_mat(2,len), fu
1
Warning: Unused variable ‘fu’ declared at (1) [-Wunused-variable]
jul.f90:35:36:
integer :: outeriter, i, sss, next_mat(2,len), fu
1
Warning: Unused variable ‘sss’ declared at (1) [-Wunused-variable]
Running:
ian#eris:~/work/stack$ ./a.out
9.8343643237979391E-005
Time = 0.908 seconds.
What #Ian Bush says in his answer about the dual precision is correct. Moreover,
You will likely not need openmp for the kind of parallelization you have done in your code. The Fortran's intrinsic do concurrent() will automatically parallelize the loop for you (when the code is compiled with the parallel flag of the respective compiler).
Also, the where elsewhere construct is slow as it often requires the creation of a logical mask array and then applying it in a do-loop. You can use do concurrent() in place of where to both avoid the extra temporary array creation and parallelize the computation on multiple cores.
Also, when comparing 64bit precision numbers, it's good to make sure both values are the same type and kind to avoid an implicit type/kind conversion before the comparison is made.
Also, the calculation of a_grid(i) - a_grid/R in computing the c array is redundant and can be avoided in the subsequent line.
Here is the modified optimized parallel Fortran code without any OpenMP,
module mod_calc
use iso_fortran_env, only: dp => real64
implicit none
integer, parameter :: len = 1001
public :: dp, len
contains
subroutine linspace(from, to, array)
real(dp), intent(in) :: from, to
real(dp), intent(out) :: array(:)
real(dp) :: range
integer :: n, i
n = size(array)
range = to - from
if (n == 0) return
if (n == 1) then
array(1) = from
return
end if
do concurrent(i=1:n)
array(i) = from + range * (i - 1) / (n - 1)
end do
end subroutine
subroutine calc_val()
implicit none
real(dp) :: bbeta, sigma, R, zeta, tol, test
real(dp) :: a_grid(len), V_mat(len,2), V_last(len,2), u(len,len,2), c(len,len,2)
integer :: outeriter, i, j, k, sss, next_mat(2,len), fu
character(len=*), parameter :: FILE_NAME = 'data.txt' ! File name.
call linspace(from=0._dp, to=100._dp, array=a_grid)
bbeta = 0.98_dp
sigma = 0.5_dp
R = 1.0_dp/0.98_dp
zeta = 1.0_dp - 1.0_dp/sigma
tol = 1e-4_dp
test = 1000.0_dp
outeriter = 0
do concurrent(i=1:len)
c(1:len,i,2) = a_grid(i) - a_grid/R
c(1:len,i,1) = c(1:len,i,2) + 20.0_dp
end do
u = c**zeta * 1.0_dp/zeta
do concurrent(i=1:len, j=1:len, k=1:2)
if (c(i,j,k)<=0._dp) u(i,j,k) = -1e6_dp
end do
V_mat = 0.0_dp
next_mat = 0.0_dp
do while (test>tol .and. outeriter<20000)
outeriter = outeriter + 1
V_last = V_mat
do concurrent(i=1:len)
V_mat(i,1) = maxval(u(:,i,1) + bbeta*V_last(:, 2))
next_mat(i,1) = maxloc(u(:,i,1) + bbeta*V_last(:, 2),1)
V_mat(i,2) = maxval(u(:,i,2) + bbeta*V_last(:, 1))
next_mat(i,2) = maxloc(u(:,i,2) + bbeta*V_last(:, 1),1)
end do
test = maxval(abs(log(V_last/V_mat)))
end do
Write( *, * ) test
end subroutine
end module mod_calc
program main
use mod_calc
implicit none
integer:: clck_counts_beg,clck_rate,clck_counts_end
call system_clock ( clck_counts_beg, clck_rate )
call calc_val()
call system_clock ( clck_counts_end, clck_rate )
write (*, '("Time = ",f6.3," seconds.")') (clck_counts_end - clck_counts_beg) / real(clck_rate)
end program main
Compiling your original code with /standard-semantics /F0x1000000000 /O3 /Qip /Qipo /Qunroll /Qunroll-aggressive /inline:all /Ob2 /Qparallel Intel Fortran compiler flags, yields the following timing,
original.exe
Time = 37.284 seconds.
compiling and running the parallel concurrent Fortran code in the above (on at most 4 cores, if any at all is used) yields,
concurrent.exe
Time = 0.149 seconds.
For comparison, this MATLAB's timing,
Value Function converged in 362 steps.
Elapsed time is 3.575691 seconds.
One last tip: There are several vectorized array computations and loops in the above code that can still be merged together to even further improve the speed of your Fortran code. For example,
u = c**zeta * 1.0_dp/zeta
do concurrent(i=1:len, j=1:len, k=1:2)
if (c(i,j,k)<=0._dp) u(i,j,k) = -1e6_dp
end do
in the above code can be all merged with the do concurrent loop appearing before it,
do concurrent(i=1:len)
c(1:len,i,2) = a_grid(i) - a_grid/R
c(1:len,i,1) = c(1:len,i,2) + 20.0_dp
end do
If you decide to do so, then you can define an auxiliary variable inverse_zeta = 1.0_dp / zeta to use in the computation of u inside the loop instead of using * 1.0_dp / zeta, thus avoiding the extra division (which is more costly than multiplication), without degrading the readability of the code.

Convert this equation from Matlab to TCL

I want to convert this equation from Matlab to TCL, Can anyone help.
t = 2*pi*rand(n,1);
r = R*sqrt(rand(n,1));
x = x0 + r.*cos(t);
y = y0 + r.*sin(t);
Though there are many similarities, MATLAB has one key difference from Tcl: as well as single values, it can work on vectors, matrices and tensors. This makes translating your code a little more involved. (Apart from that, = maps to set, and expressions are computed with expr.)
# Example initial parameters
set n 123
set R 321
set x0 456
set y0 654
# The translation of your code
set x {}
set y {}
set pi 3.1415927
for {set i 0} {$i < $n} {incr i} {
set t [expr {2 * $pi * rand()}]
set r [expr {$R * rand()}]
lappend x [expr {$x0 + $r * cos($t)}]
lappend y [expr {$y0 + $r * sin($t)}]
}
Note that shorter forms than this are possible through the use of an extension package such as VecTcl.

python optimising multiple functions with common variables

i am trying to minimize (globally) 3 functions that use common variables, i tried to combine them into one function and minimize that using L-BFGS-B(i need to set boundaries for the variables), but it has shown to be very difficult to balance each parameter with weightings, i.e. when one is minimised the other will not be. I also tried to use SLSQP method to minimize one of them while setting others as constraints, but the constraints are often ignored/not met.
Here are what need to be minimized, all the maths are done in meritscalculation and meritoflength, meritofROC, meritofproximity, heightorderreturnedare returned from the calculations as globals.
def lengthmerit(x0):
meritscalculation(x0)
print meritoflength
return meritoflength
def ROCmerit(x0):
meritscalculation(x0)
print meritofROC
return meritofROC
def proximitymerit(x0):
meritscalculation(x0)
print meritofproximity+heightorder
return meritofproximity+heightorder
i want to minimize all of them using a common x0 (with boundaries) as independent variable, is there a way to achieve this?
Is this what you want to do ?
minimize a * amerit(x) + b * bmerit(x) + c * cmerit(x)
over a, b, c, x:
a + b + c = 1
a >= 0.1, b >= 0.1, c >= 0.1 (say)
x in xbounds
If x is say [x0 x1 .. x9], set up a new variable abcx = [a b c x0 x1 .. x9],
constrain a + b + c = 1 with a penalty term added to the objective function,
and minimize this:
define fabc( abcx ):
""" abcx = a, b, c, x
-> a * amerit(x) + ... + penalty 100 (a + b + c - 1)^2
"""
a, b, c, x = abcx[0], abcx[1], abcx[2], abcx[3:] # split
fa = a * amerit(x)
fb = b * bmerit(x)
fc = c * cmerit(x)
penalty = 100 * (a + b + c - 1) ** 2 # 100 ?
f = fa + fb + fc + penalty
print "fabc: %6.2g = %6.2g + %6.2g + %6.2g + %6.2g a b c: %6.2g %6.2g %6.2g" % (
f, fa, fb, fc, penalty, a, b, c )
return f
and bounds = [[0.1, 0.5]] * 3 + xbounds, i.e. each of a b c in 0.1 .. 0.5 or so.
The long print s should show you why one of a b c approach 0 --
maybe one of amerit() bmerit() cmerit() is way bigger than the others ?
Plots instead of prints would be easy too.
Summary:
1) formulate the problem clearly on paper, as at the top
2) translate that into python.
here is the result of some scaling and weighting
objective function:
merit_function=wa*meritoflength*1e3+wb*meritofROC+wc*meritofproximity+wd*heightorder*10+1000 * (wa+wb+wc+wd-1) ** 2
input:
abcdex=np.array(( 0.5, 0.5, 0.1, 0.3, 0.1...))
output:
fun: array([ 7.79494644])
x: array([ 4.00000000e-01, 2.50000000e-01, 1.00000000e-01,
2.50000000e-01...])
meritoflength : 0.00465499380753. #target 1e-5, usually start at 0.1
meritofROC: 23.7317956542 #target ~1, range <33
Heightorder: 0 #target :strictly 0, range <28
meritofproximity : 0.0 #target:less than 0.02, range <0.052
i realised after a few runs, all the weightings tend to stay at the minimum values of the bound, and im back to manually tuning the scaling problem i started with.
Is there a possibility that my optimisation function isnt finding the true global minimum?
here is how i minimised it:
minimizer_kwargs = {"method": "L-BFGS-B", "bounds": bnds, "tol":1e0 }
ret = basinhopping(merit_function, abcdex, minimizer_kwargs=minimizer_kwargs, niter=10)
zoom = ret['x']
res = minimize(merit_function, zoom, method = 'L-BFGS-B', bounds=bnds, tol=1e-6)

lapack - addressing for fully packed rectangular format

I would like to use the LAPACK routines for factorisation and inversion of matrices using the fully packed rectangular format, as this requires only n(n+1)/2 elements to be stored for a symmetric nxn matrix. So far, I am setting up the matrix in 'packed' format and transform it calling routine DTPTTF. However, this requires a second array. I would like to build my matrix directly in fully packed rectangular format (to save on space) - is there an 'addressing' function which will give me the position of the i,j-th element? or could somebody point me to the relevant formula?
to partly answer my own question: inspecting the source code of DTPTTF and the example given therein, I've worked out the adress for one of the four possible constellations (the only one I need), namely uplo ='L' and trans ='N'. below is my fortran function:
! ==================================== ! returns address for RFP format
integer function ijfprf( ii, jj, n ) ! for row jj and column ii
! ==================================== ! for UPLO = 'L' and TRANSR = 'N' only!
implicit none
integer, intent(in) :: ii, jj, n
integer :: i, j, k, n1, k1
if( ii <= jj ) then
i = ii; j = jj
else
i = jj; j = ii
end if
k = n/2
if( mod(n,2) == 0 ) then ! n even
n1 = n + 1
if( i <= k ) then
ijfprf = 1 + (i - 1) * n1 + j
else
ijfprf = ( j - k - 1 ) * n1 + i - k
end if
else ! n odd
k1 = k + 1
if( i > k1 ) then
ijfprf = ( j - k1 ) * n + i - k1
else
ijfprf = ( i - 1 ) * n + j
end if
end if
return
end function ijfprf