Difference between memory usage when using standard arrays and derived types in fortran 90 - fortran90

I observed a weird behavior regarding the memory usage of derived data types. The following fortran90 code demonstrates the issue.
module prec
implicit none
integer, parameter :: d_t = selected_real_kind(15,307)
end module
module typdef
use prec
implicit none
type level_2
real(kind=d_t), allocatable :: lev_3(:)
end type
type level_1
type(level_2),allocatable :: lev_2(:,:)
end type
type array
type(level_1),allocatable :: lev_1(:,:)
end type
end module
program mem_test
use prec
use typdef
implicit none
integer :: n,i,j,k,l,m,egmax,niter,ncells,namom,nsmom
real(kind=d_t),allocatable :: simple_array(:,:,:,:,:)
type(array) :: fancy_array
real(kind=d_t) :: it
egmax=7
niter=2
ncells=3000000
namom=1
nsmom=1
!
!
!
allocate( simple_array(egmax,niter,ncells,namom,nsmom) )
!
!
!
allocate( fancy_array%lev_1(egmax,niter))
do i=1,niter
do j=1,egmax
allocate( fancy_array%lev_1(j,i)%lev_2(ncells,namom) )
end do
end do
do i=1,niter
do j=1,egmax
do k=1,namom
do l=1,ncells
allocate( fancy_array%lev_1(j,i)%lev_2(l,k)%lev_3(nsmom) )
end do
end do
end do
end do
!
do n=1,100000
it=0.0_d_T
do i=1,100000
it=it+1.0_d_t
end do
end do
!
!
deallocate(simple_array)
deallocate(fancy_array%lev_1)
end program
I want to store data in a multi-dimensional array (egmax*niter*ncell*namom*nsmom double precision numbers). I did that in two different ways:
A multidimensional standard array "simple_array(egmax,niter,...,)"
A nested derived data structure "fancy_array" as defined in the piece of code that I provided.
I compiled the code using
ifort -g -o test.exe file.f90
I ran it in valgrind and compared the memory consumption of simple_array and fancy_array. simple_array uses about 300Mb as expected while fancy_array uses 3Gb (10 times as much) even though it stores
the same number of real numbers. Therefore, it should also consume only 300Mb.
Running a simpler test case, where the derived type is only one level deep, e.g.
type level_1
real(kind=d_t),allocatable :: subarray(:)
end type
type array
type(level_1),allocatable :: lev_1(:)
end type
consumes exactly the amount of memory I am expecting it to. It does not consume 10x as much
memory. Has anyone observed similar behavior or has any clue why this would occur? My only idea for the reason of the described behavior is that fancy_array alocated non-contiguous memory and fortran somehow needs to keep track of it, hence the increase in memory consumption. I would appreciate any input or similar observations.
Thanks for your help.
Sebastian

(Allocatable components are a Fortran 2003 feature.)
The typical means by which Fortran processors (including Intel Fortran) implementation of allocatable array objects is to use a descriptor - a data structure that contains information such as the location of the array data in memory and the bounds and stride of each dimension of the array, amongst other things.
For Intel Fortran on a x64 platform that descriptor takes 72 bytes for a one dimensional allocatable array. In your derived type case you have about 42 million such arrays - one for every lev_3 component that you bring into existence, plus a much smaller number for the parent allocatable components. 72 bytes by 42 million gives about 3 GB. There may be further overhead associated with the underlying memory allocator.
On the same platform the descriptor for a rank five array takes 168 bytes and there is only one memory allocation
The data storage requirements for the two approaches will be about the same.
Note the capability offered by the two approaches is significantly different (hence the difference in overhead) - in the derived type case you can change the allocation status, bounds and extent of each lev_3 component. You do not have anywhere near that flexibility in in the single array case - if allocated that array must be rectangular.
(In Fortran 90 your component's dimensions in their declarations would need to be constant expressions (fixed at compile time). No descriptors would be used and the memory requirements of the two approaches would converge.)

Related

Do loop to be parallelized in a Matlab's Mex function (Fortran) with OpenMP

I have a do loop in a Matlab's Mex function (written in Fortran) that performs some calculations for each element of a FEM mesh. My mesh consists of 250k elements, so I thought it was worth parallelizing it. This is my first attempt at parallelizing this code with OpenMP (I'm a beginner at coding). I used the reduction command to avoid the race condition in fintk(dofele) = fintk(dofele) + fintele. Is it correct? I can compile it in Matlab without any problem. However, when I use it (in Matlab), it produces correct results for a 12k element mesh and it is faster than the serialized one, but when I try to use it for the 250k element mesh Matlab crashes. Thank you for helping me
subroutine loop_over_elements( &
! OUT
fintk,Sxyz,&
! IN
Elem,Bemesh,Dofelemat,u,dt,NE,NDOF)
use omp_lib
implicit none
mwSize NE, NDOF, ele
integer, parameter :: dp = selected_real_kind(15,307)
real(dp) :: fintk(NDOF), Sxyz(6,NE), Elemat(4,NE), Bemesh(6,12,NE), Dofelemat(12,NE)
real(dp) :: u(NDOF)
real(dp) :: Bele(6,12), fintele(12), uele(12), si(6), dt
integer*4 :: nodes(4), dofele(12)
fintk = 0.D0
!$OMP PARALLEL DO REDUCTION(+:fintk(:)) PRIVATE(ele,nodes,Bele,dofele,uele,si,fintele)
DO ele = 1, NE
nodes = Elemat(1:4,ele)
Bele = Bemesh(1:6,1:12,ele)
dofele = Dofelemat(1:12,ele)
uele = u(dofele)
call comput_subroutine( &
! IN
Bele,uele,dt, &
! OUT
si)
Sxyz(:,ele) = si
fintele = MATMUL(TRANSPOSE(Bele),si)
fintk(dofele) = fintk(dofele) + fintele
END DO
!$OMP END PARALLEL DO
return
end
I solved the "crashing" of Matlab that I was experiencing by adding this line in the general mexFunction subroutine before calling the loop_over_elements subroutine:
call KMP_SET_STACKSIZE(100000000). I figured that, since Matlab didn't crash when I was using the large model with the non-parallelized subroutine, maybe it was a memory problem. After that I discovered the well-known (not to me unfortunately) segmentation fault problem when using OpenMp with large arrays (see for example this). I'm still a bit confused about the difference between setting OMP_STACKSIZE(which I don't know how to do in Mex functions) and KMP_SET_STACKSIZE, but now the parallelized code works with the large model.

Matlab Horzcat - Out of memory

Any trick to avoid an out of memory error in matlab?
I am assuming that the reason it shows up is because matlab is very inefficient in using horzcat and actually needs to temporarily duplicate matrices.
I have a matrix A with size 108977555 x 25. I want to merge this with three vectors d, m and y with size 108977555 x 1 each.
My machine has 32GB ram, and the above matrice + vectors occupy 18GB.
Now I want to run the following command:
A = [A(:,1:3), d, m, y, A(:,5:end)];
But that yields the error:
Error using horzcat
Out of memory. Type HELP MEMORY for your options.
Any trick to do this merge?
Working with Large Data Sets. If you are working with large data sets, you need to be careful when increasing the size of an array to avoid getting errors caused by insufficient memory. If you expand the array beyond the available contiguous memory of its original location, MATLAB must make a copy of the array and set this copy to the new value. During this operation, there are two copies of the original array in memory.
Restart matlab, I often find it doesn't fully clean up its memory or it get's fragmented, leading to lower maximal array sizes.
Change your datatype (if you can). E.g. if you're only dealing with numbers 0 - 255, use uint8, the memory size will reduce by a factor 8 compared to an array of doubles
Start of with A already large enough (i.e. 108977555x27 instead of 108977555x25 and insert in place:
A(:, 4) = d;
clear d
A(:, 5) = m;
clear m
A(:, 6) = y;
Merge the data in one datatype to reduce total memory requirement, eg a date easily fits into one uint32.
Leave the data separated, think about why you want the data in one matrix in the first place and if that is really necessary.
Use C-code to do the data allocation yourself (only if you're really desperate)
Further reading: https://nl.mathworks.com/help/matlab/matlab_prog/memory-allocation.html
Even if you could make it using Gunther's suggestions, it will just occupy memory. Right now it takes more than half of available memory. So, what are you planning to do then? Even simple B = A+1 doesn't fit. The only thing you can do is stuff like sum, or operations on part of array.
So, you should consider going to tall arrays and other related big data concepts, which are exactly meant to work with such large datasets.
https://www.mathworks.com/help/matlab/tall-arrays.html
You can first try the efficient memory management strategies as mentioned on the official mathworks site : https://in.mathworks.com/help/matlab/matlab_prog/strategies-for-efficient-use-of-memory.html
Use Single (4 bytes) or some other smaller data type instead of Double (8 bytes) if your code can work with that.
If possible use block processing (like rows or columns) i.e. store blocks as separate mat files and load and access only those parts of the matrix which are required.
Use matfile command for loading large variables in parts. Perhaps something like this :
save('A.mat','A','-v7.3')
oldMat = matfile('A.mat');
clear A
newMat = matfile('Anew.mat','Writeable',true) %Empty matfile
for i=1:27
if (i<4), newMat.A(:,i) = oldMat.A(:,i); end
if (i==4), newMat.A(:,i) = d; end
if (i==5), newMat.A(:,i) = m; end
if (i==6), newMat.A(:,i) = y; end
if (i>6), newMat.A(:,i) = oldMat.A(:,i-2); end
end

Subscripted assignment dimension mismatch. error in matlab

for i=1:30
Name(i,1)=sprintf('String_%i',i);
end
I'm just confused what is not working here, this script seems very straightforward, wnat to build a list of strings with numbering from 1 to 30. getting error
Subscripted assignment dimension mismatch.
Matlab do not really have strings, they have char arrays. As in almost any programming language Matlab cannot define a variable without knowing how much memory to allocate. The java solution would look like this:
String str[] = {"I","am","a","string"};
Similar to the c++ solution:
std::string str[] = {"I","am","another","string"};
The c solution looks different, but is generally the same solution as in c++:
const char* str[] = {"I","am","a","c-type","string"};
However, despite the appearances these are all fundamentally the same in the sense to that they all knows how much data to allocate even though they would not be initiated. In particular you can for example write:
String str[3];
// Initialize element with an any length string.
The reason is that the memory stored in each element is stored by its reference in java and by a pointer in c and c++. So depending on operating system, each element is either 4 (32-bit) or 8 (64-bit) bytes.
However, in Matlab matrices data is stored by value. This makes it impossible to store a N char arrays in a 1xN or Nx1 matrix. Each element in the matrix is only allowed to be of the same size as a char and be of type char. This means that if you work with strings you need to use the data structure cell (as also suggested by Benoit_11) which stores a reference to any Matlab object in each element.
k = 1:30;
Name = cell(length(k),1);
for i=k
Name{i,1}=sprintf('String_%i',i);
end
Hope that the explanation makes sense to you. I assumed that according to your attempt you have at least some programming experience from at least one other language than matlab.

deallocating array of pointer in fortran

I am new to Fortran. I am facing a problem with array of pointers. I am writing code in Fortran90. Due to large size of data I have to use array of pointers. Pseudocode is like this
type ::a
integer ::Id
real ::Coords(3)
end type a
type ::b
type(a),pointer ::Member
end type b
type(b),allocatable ::data(:)
integer ::i
!allocation and assigning
allocate(data(n))
do i=1,n
allocate(d(i)%member)
d(i)%member%Id = i
d(i)%member%Coordinates(:) = i*2.0
end do
!Some stuff using data
!De allocation
do i=1,n
deallocate(d(i)%member)
end do
deallocate(data)
I have observed that deallocating the array of pointers in this way is very slow. When dealing with some milions of datasets, it takes significant time to do this. and if I don't do this, the memory doesn't gets released. I want to reuse the data by assigning with new size (new n value in pseudocode) without exiting the program. So I can't run away without deallocating the array, otherwise there will not be enough memory available for doing next stuffs.
Can anyone suggest me if there could be some other way of deallocating array of pointers in Fortran? or can I handle large dataset in my Fortran program?
Thanks

Does Fortran have inherent limitations on numerical accuracy compared to other languages?

While working on a simple programming exercise, I produced a while loop (DO loop in Fortran) that was meant to exit when a real variable had reached a precise value.
I noticed that due to the precision being used, the equality was never met and the loop became infinite. This is, of course, not unheard of and one is advised that, rather than comparing two numbers for equality, it is best see if the absolute difference between two numbers is less than a set threshold.
What I found disappointing was how low I had to set this threshold, even with variables at double precision, for my loop to exit properly. Furthermore, when I rewrote a "distilled" version of this loop in Perl, I had no problems with numerical accuracy and the loop exited fine.
Since the code to produce the problem is so small, in both Perl and Fortran, I'd like to reproduce it here in case I am glossing over an important detail:
Fortran Code
PROGRAM precision_test
IMPLICIT NONE
! Data Dictionary
INTEGER :: count = 0 ! Number of times the loop has iterated
REAL(KIND=8) :: velocity
REAL(KIND=8), PARAMETER :: MACH_2_METERS_PER_SEC = 340.0
velocity = 0.5 * MACH_2_METERS_PER_SEC ! Initial Velocity
DO
WRITE (*, 300) velocity
300 FORMAT (F20.8)
IF (count == 50) EXIT
IF (velocity == 5.0 * MACH_2_METERS_PER_SEC) EXIT
! IF (abs(velocity - (5.0 * MACH_2_METERS_PER_SEC)) < 1E-4) EXIT
velocity = velocity + 0.1 * MACH_2_METERS_PER_SEC
count = count + 1
END DO
END PROGRAM precision_test
Perl Code
#! /usr/bin/perl -w
use strict;
my $mach_2_meters_per_sec = 340.0;
my $velocity = 0.5 * $mach_2_meters_per_sec;
while (1) {
printf "%20.8f\n", $velocity;
exit if ($velocity == 5.0 * $mach_2_meters_per_sec);
$velocity = $velocity + 0.1 * $mach_2_meters_per_sec;
}
The commented-out line in Fortran is what I would need to use for the loop to exit normally. Notice that the threshold is set to 1E-4, which I feel is quite pathetic.
The names of the variables come from the self-study-based programming exercise I was performing and don't have any relevance.
The intent is that the loop stops when the velocity variable reaches 1700.
Here are the truncated outputs:
Perl Output
170.00000000
204.00000000
238.00000000
272.00000000
306.00000000
340.00000000
...
1564.00000000
1598.00000000
1632.00000000
1666.00000000
1700.00000000
Fortran Output
170.00000000
204.00000051
238.00000101
272.00000152
306.00000203
340.00000253
...
1564.00002077
1598.00002128
1632.00002179
1666.00002229
1700.00002280
What good is Fortran's speed and ease of parallelization if its accuracy stinks? Reminds me of the three ways to do things:
The Right Way
The Wrong Way
The Max Power Way
"Isn't that just the wrong way?"
"Yeah! But faster!"
All kidding aside, I must be doing something wrong.
Does Fortran have inherent limitations on numerical accuracy compared to other languages, or am I (quite likely) the one at fault?
My compiler is gfortran (gcc version 4.1.2), Perl v5.12.1, on a Dual Core AMD Opteron # 1 GHZ.
Your assignment is accidentally converting the value to single precision and then back to double.
Try making your 0.1 * be 0.1D0 * and you should see your problem fixed.
As already answered, "plain" floating point constants in Fortran will default to the default real type, which will likely be single-precision. This is an almost classic mistake.
Also, using "kind=8" is not portable -- it will give you double precision with gfortran, but not with some other compilers. The safe, portable way to specify precisions for both variables and constants in Fortran >= 90 is to use the intrinsic functions, and request the precision that you need. Then specify "kinds" on the constants where precision is important. A convenient method is to define your own symbols. For example:
integer, parameter :: DR_K = selected_real_kind (14)
REAL(DR_K), PARAMETER :: MACH_2_METERS_PER_SEC = 340.0_DR_K
real (DR_K) :: mass, velocity, energy
energy = 0.5_DR_K * mass * velocity**2
This can also be important for integers, e.g., if large values are needed. For related questions for integers, see Fortran: integer*4 vs integer(4) vs integer(kind=4) and Long ints in Fortran