Dymola: Avoid "Not enough storage for initial variable data" for large Modelica model - modelica

I am trying to simulate a large Modelica model in Dymola. This model uses several records that define time series input data (data with 900 second intervals for 1 year), which it reads via the CombiTimeTable model.
If I limit the records to only contain the data for 2 weeks (also 900 second intervals), the model simulates fine.
With the yearly data, the translation seems to run successfully, but simulation fails. The dslog file contains the message Not enough storage for initial variable data.
This happens on a Windows 10 system with 8 GB RAM as well as on a Windows 7 system with 32 GB RAM.
Is there any way to avoid this error and get the simulation to run? Thanks in advance!

The recommended way is to have the time series data not within the records (that is in your model or library) but as external data files. The CombiTimeTable supports both reading from text file and MATLAB MAT file at simulation run-time. You will also benefit from shorter translation times.
You still can organize your external files relative to your library by means of Modelica URIs since the CombiTimeTable (as well as the other table blocks) already call the loadResource function. The recommended way is to organize these files in an Resources directory of your Modelica package.

Related

Scalable approach to maintaining data files for simulations in Gatling

So I have been part of building a framework with Gatling and we were thinking if there is a scalable way to maintain CSV files for large number of simulations.
What we want to achieve:
A single data file might be used by multiple simulations
A single csv file contains multiple values and few of the values are used by 1 simulation and other set of values are used by other simulations
Be able to have a default data file for each simulation
Be able to override default data file if required and pass a new data file from may be command line.
In some cases I would want that if I am running 10 simulations and I want to override input data file for just 1 simulation that should be possible as well.
Given above requirement and from my knowledge of JMeter I am thinking of following approach:
Hardcode input data file for each simulation through feeder
Have environment variable read in the framework and use them to override default data files for respective simulation. But this would mean having a lot of environment variables in framework i.e. 1 env variable for each simulation so we can override input file just for the simulation that we want for:
String envVarToGetInputDataFileNameForEachSimulation = System.getEnv(key_of_variable);
Limitations and questions:
Is hardcoding csv files for respective simulations a good idea?
When number of simulation grows, usually in enterprise frameworks that you build for your organisations do we maintain so many csv files? or use some other way?
If I want to override a default file as stated in approach point 2., is it good idea to use 1 environment variable for 1 simulation? or is their a better way? And in this approach won't we end up creating too many environment variables as number of simulation grows?
Any other input from folks who have built large gatling frameworks and managed large number of input files can you please suggest a better and scalable way of doing it?

A better approach to store, write, and read a big dataset of meteorological data

I am looking a better to store, write and read meteorological data (about 30 GB in raw text format).
Currently I am using NetCDF file format to store weather records. In this NetCDF file, I have 3 dimensions: time, climate variables, locations. But the dimension order is the key constrain for my tasks (see below).
The first task is to update weather records every day for about 3000 weather stations. Dimension order (time, var, name) provides the best performance for writing as the new data will be added at the end of NetCDF file.
The second task is to read all daily weather record for a station to preform analyzing. Dimension order (name, var, time) provides the best performance for reading as all records of one site are stored together.
The two tasks have the conflict designs of NetCDF file (best performance in one task, but the worst performance in another task).
My questions is whether there are alternative methods/software/data format to store, write and read my datasets to provide best performance of my two tasks? As I have to repeat the two steps everyday and data analyzing is time consuming, I need to find a best way to minimize I/O.
Thanks for any suggestions. Please let me know if my question is not clear.
Ok, what You need is chunking. I created a small python script to test, without chunking it basically confirms Your observation that access is slow in one dimension. I tested with station number 3000, variables per station 10 and timesteps 10000. I did put stations and variables into the same dimension for testing, but it should give similar results in 3D case if You really need it.
My test output without chunking:
File chunking type: None
Variable shape: (30000, 10000)
Total time, file creation: 13.665503025054932
Average time for adding one measurement time: 0.00136328568459 0.00148195505142 0.0018851685524
Read all timeseries one by one with single file open
Average read time per station/variable: 0.524109539986
And with chunking:
File chunking type: [100, 100]
Variable shape: (30000, 10000)
Total time, file creation: 18.610711812973022
Average time for adding one measurement time: 0.00185681316853 0.00168470859528 0.00213300466537
Read all timeseries one by one with single file open
Average read time per station/variable: 0.000948731899261
What You can see is that chunking increases write time by about 50%, but dramatically improves read time. I did not try to optimise chunk sizes, just tested that it works in the right direction.
Feel free to ask if code is not clear or You are not familiar with python.
# -*- coding: utf-8 -*-
from time import time
import numpy as np
from netCDF4 import Dataset
test_dataset_name='test_dataset.nc4'
num_stations=3000
num_vars=10
chunks=None
#chunks=[100,100]
def create_dataset():
ff=Dataset(test_dataset_name,'w')
ff.createDimension('Time',None)
ff.createDimension('Station_variable',num_stations*num_vars)
if chunks:
var1=ff.createVariable('TimeSeries','f8', ('Station_variable','Time'),chunksizes=chunks)
else:
var1=ff.createVariable('TimeSeries','f8',('Station_variable','Time'))
return ff
def add_data(ff,timedim):
var1=ff.variables['TimeSeries']
var1[0:1000,timedim]=timedim*np.ones((1000),'f8')
def dataset_close(inds):
inds.close()
## CREATE DATA FILE
time_start=time()
time1=[]
time2=[]
time3=[]
time4=[]
testds=create_dataset()
dataset_close(testds)
for i in range(10000):
time1.append(time())
ff=Dataset(test_dataset_name,'a')
time2.append(time())
add_data(ff,i)
time3.append(time())
ff.sync()
ff.close()
time4.append(time())
time_end=time()
time1=np.array(time1)
time2=np.array(time2)
time3=np.array(time3)
time4=np.array(time4)
## READ ALL STAION-VARIABLE COMBINATIONS AS ONE TIMESERIES
ff=Dataset(test_dataset_name,'r')
## PRINT DATA FILE CREATION SUMMARY
print("File chunking type:",chunks)
print("Variable shape:",ff.variables['TimeSeries'][:].shape)
print("Total time, file creation:", time_end-time_start)
print("Average time for adding one measurement time: ",np.mean(time4- time1), np.mean(time4[:100]-time1[:100]),np.mean(time4[-100:]- time1[-100:]))
print("Read all timeseries one by one with single file open")
time_rstart=[]
time_rend=[]
for i in range(0,ff.variables['TimeSeries'][:].shape[0],int(ff.variables['TimeSeries'][:].shape[0]/100)):
time_rstart.append(time())
dataline=ff.variables['TimeSeries'][i,:]
time_rend.append(time())
time_rstart=np.array(time_rstart)
time_rend=np.array(time_rend)
print("Average read time per station/variable: ",np.mean(time_rend- time_rstart))

Best Time Series Format for Querying and Converting to Matlab (HDF5)

I have somewhat of a unique problem that looks similar to the problem here :
https://news.ycombinator.com/item?id=8368509
I have a high-speed traffic analysis box that is capturing at about 5 Gbps, and picking out specific packets from this to save into some format in a C++ program. Each day there will probably be 1-3 TB written to disk. Since it's network data, it's all time series down to the nanosecond level, but it would be fine to save it at second or millisecond level and have another application sort the embedded higher-resolution timestamps afterwards. My problem is deciding which format to use. My two requirements are:
Be able to write to disk at about 50 MB/s continuously with several different timestamped parameters.
Be able to export chunks of this data into MATLAB (HDF5).
Query this data once or twice a day for analytics purposes
Another nice thing that's not a hard requirement is :
There will be 4 of these boxes running independently, and it would be nice to query across all of them and combine data if possible. I should mention all 4 of these boxes are in physically different locations, so there is some overhead in sharing data.
The second one is something I cannot change because of legacy applications, but I think the first is more important. The types of queries I may want to export into matlab are something like "Pull metric X between time Y and Z", so this would eventually have to go into an HDF5 format. There is an external library called MatIO that I can use to write matlab files if needed, but it would be even better if there wasn't a translation step. I have read the entire thread mentioned above, and there are many options that appear to stand out: kdb+, Cassandra, PyTables, and OpenTSDB. All of these seem to do what I want, but I can't really figure out how easy it would be to get it into the MATLAB HDF5 format, and if any of these would make it harder than others.
If anyone has experience doing something similar, it would be a big help. Thanks!
A KDB+ tickerplant is certainly capable of capturing data at that rate, however there's lots of things you need to make sure (whatever solution you pick)
Do the machine(s) that are capturing the data have enough cores? Best to taskset a tickerplant, for example, to a core that nothing else will contend with
Similarly with disk - SSD, be sure there is no contention on the bus
Separate the workload - can write different types of data (maybe packets can be partioned by source or stream?) to different cpus/disks/tickerplant processes.
Basically there's lots of ways you can cut this. I can say though that with the appropriate hardware KDB+ could do the job. However, given you want HDF5 it's probably even better to have a simple process capturing the data and writing/converting to disk on the fly.

FORTRAN: Best way to store large amount of data which is readable in MATLAB

I am working on developing an application in Fortran where I have points defining quadrilateral panels on the surface of an object. I am calculating various parameters on these quadrilateral panels for a number of frequencies.
The output file should look like:
FREQUENCY,PANEL_NUMBER,X1,Y1,Z1,X2,Y2,Z2,X3,Y3,Z3,X4,Y4,Z4,AREA,PRESSURE,....
0.01,1,....
0.01,2,....
0.01,3,....
.
.
.
.
0.01,2000,....
0.02,1,....
0.02,2,....
.
.
.
0.02,2000,...
.
.
I am expecting a maximum of 300,000 rows with 30 columns. Data types are composed of integer, real and complex numbers. I want to store this file and later read the file in MATLAB to create a 3D geometry which I will color based on pressure at each panel.
The problem is, as you can see from the file structure, there is lot of data. I am currently writing this as a CSV file and the size is about 26GB.
I do not want to use database to handle this. Could anyone suggest what file format I should write this data using FORTRAN.
Thanks for your help,
Amitava
Store the data in the native format of the computer rather than in a human-readable file in which the numbers have been converted to base 10 and characters. This will produce the smallest file and the fastest to process. On the Fortran open statement, use form='unformatted', access='stream'. The first causes the file to be unformatted, the second causes Fortran not to include its usual record-length information, which is Fortran specific. This omission makes the file more portable to other languages. Someone else can help better with how to read the file in MATLAB; I found this on the web: http://www.mathworks.com/help/matlab/import_export/importing-binary-data-with-low-level-i-o.html
UPDATE: This approach has several assumptions. It might not work easily if you wish to transport the file between different types of computers. Your question implies that want many rows of identical content. Identical rows simply matches a file structure with that number of identical records. It seems that you want to read the entire file, in which case a sequential file is appropriate. If you wish to read "random" records, a Fortran direct access file might be useful. With the simplicity of identical records, using a native file format seems easy. If you want self-documentation or portability across computers (different numeric representations), a file format such as HDF or FITS would be useful.
I second #steabert's mention of NetCDF and there's also HDF5 (on which the NetCDF 4 format is built). However, it does depend on what you mean by "data types": they are best used with regular/rigid data layouts and NetCDF's support for Fortran derived types can be painful at times.
Possible advantages for cases with large lumps are data transparent compression; data checksumming; and possibly more natural random access (that is, no need to compute seek positions based on array index) compared with Fortran stream access. That's on top of the usual things of a self-documenting and portable file format.
MATLAB has inbuilt support for reading these files, and recent versions also support the OPeNDAP framework so you wouldn't even need to have the file on the same (or multiple) machine(s).
Of course, disadvantages: extra software; extra skills development (especially for HDF5); and increased code complexity on the Fortran side.

Data management in matlab versus other common analysis packages

Background:
I am analyzing large amounts of data using an object oriented composition structure for sanity and easy analysis. Often times the highest level of my OO is an object that when saved is about 2 gigs. Loading the data into memory is not an issue always, and populating sub objects then higher objects based on their content is much more java memory efficient than just loading in a lot of mat files directly.
The Problem:
Saving these objects that are > 2 gigs will often fail. It is a somewhat well known problem that I have gotten around by just deleting a number of sub objects until the total size is below 2-3 gigs. This happens regardless of how boss the computer is, a 16 gigs of ram 8 cores etc, will still fail to save the objects correctly. Back versioning the save also does not help
Questions:
Is this a problem that others have solved somehow in MATLAB? Is there an alternative that I should look into that still has a lot of high level analysis and will NOT have this problem?
Questions welcome, thanks.
I am not sure this will help, but here: Do you make sure to use recent version of mat file? Check for instance save. Quoting from the page:
'-v7.3' 7.3 (R2006b) or later Version 7.0 features plus support for data items greater than or equal to 2 GB on 64-bit systems.
'-v7' 7.0 (R14) or later Version 6 features plus data compression and Unicode character encoding. Unicode encoding enables file sharing between systems that use different default character encoding schemes.
Also, could by any chance your object by or contain a graphic handle object? In that case, it is wise to use hgsave