How to delete memory usage during an Experiment? - anylogic

I am constructing an experiment in Anylogic, which saves data in the Parameter variation tab under a custom-class list. The model needs to perform a lot of simulations, and repetitions to optimize for Setting variables in the model itself. After x amount of iterations, I use a Python connector to run some code in finding new possible parameters for the underlaying model.
The problem I am having right now, is that around Simulation-run number 200, the memory usage is maximum (4Gb), and it proceeds to run super-slow. I have found some interesting ways to cut on memory usage, but I believe there is only one thing that could help me right now: let the system delete memory that is used for past iterations. After each iteration, the data of a simulation is stored, so I am fine with anylogic deleting the logs of the specific simulation afterwards.
Is such a thing possible? If so, how can I implement that?

Java makes use of a Garbage collector to manage memory usage and you have no control over it. How it works is that every now and then, based on some internal logic, it will collect and remove all instances of classes in memory that do not contain any active references and remove them.
Thus to reduce memory you must ensure that any instances that are no longer needed are not referenced by any of the objects currently active in your model.
To identify these you must use a Java profiler like JProfiler, or some of the free alternatives - see here for more.
This will show you exactly what classes are using up all your memory and with some deep diving you should be able to identify who is keeping reference to them.

Related

Memory usage: global variables

I am writing an app that requires access to a set of information from all classes and methods.
My query is, what's the most efficient way of doing this, so as to minimise memory usage?
I have taught myself coding and have, of course, come across numerous different methods to solve this whilst trawling the internet. For example:
I could create a global variable such as var info = ..., which I'd place above a class definition, thus giving access from anywhere in the app.
Or I could create a singleton, GameData and have the data stored there, e.g. GameData.shared.info, similarly available from anywhere in the app.
Or I could load the information in the first ViewController and then pass it around as a parameter.
There are no doubt more methods that I haven't come across, but I wonder which is the most memory-efficient method, if indeed there is such a thing. In my case, I won't need access to a huge amount of data - no more than say sixty or seventy records each with half a dozen fields of text or numbers.
Many thanks
Using dependency injection, that is, passing it as a parameter to any object that requires it, would be the more memory conscious method, since the variable will stop existing "automatically" if you were to replace the whole hierarchy (as long as it's done correctly).
By using a singleton or a global variable, you would have to clear the value yourself.
If the value is not going to disappear during the lifetime of the application, then memory usage doesn't matter, but I would still advice against global variables and suggest using dependency injection.

IBM Window Services (DWS) csrevw function on MVS

I'm working on IBM MVS (z/OS) and trying to make Window Services working.
On the function CSREVW I don't understand what the purpose of the parameter pfcount.
Acording to the documentation this will ask to the window services to read more than one block after my program references a block that is not in my window.
But how the window services is suposed to know that I tried to reference data that are not in my window? I mean, it can't know that I'm reading data out of my window if i don't call CSREVW or CSRVIEW again.
Maybe my major issue is that I have trouble to understand english but this seems clear to me...
Here is the link to the documentation, this is explained at pages 23-24 :
http://publibz.boulder.ibm.com/epubs/pdf/iea3c102.pdf
I know this is a very specific problem about an IBM service and I apologize about that.
Thank you !
Tim
I think the problem you're having is that you need to understand a little bit about how the underlying objects behind the windowing service work in virtual storage.
At the core, the various windowing services work to give you what amounts to a "private" page dataset. You allocate and reference storage, but the objects in that virtual space aren't really in memory - the system's page fault mechanism brings them in as you reference them. So yes, you're accessing data within a "window", but in reality, the data you expect to see may not be "paged in" at that moment.
Going a little deeper, when you first allocate the object, the virtual storage it's mapped to has all of the pages marked "invalid" in the underlying page table entries. That means that as soon as you touch this storage, a page fault interrupt occurs. At this point, the operating system steps in and resolves the page fault by brining the necessary data into memory, then your program continues, oblivious to all of this processing on your behalf. You're correct that you're just referencing data within the window, but there's a lot under the covers going on to support this.
This is where PFCOUNT comes in...
Let's say you have structures that are, say, 64K long inside your virtual window. It would be sloppy and slow to reference each page of this structure and cause a page fault each time. Much better would be to use PFCOUNT to cause the page you reference and all 15 other pages needed by your object to be paged-in with a single operation. Conversely, if your data was small and you were highly random about how you access it, PFCOUNT isn't going to help you - the next page you reference could be anywhere, and it's actually wasteful to have a large PFCOUNT since you end up bringing in a lot of data you never use.
Hope that makes sense - if you'd like a challenge, take yourself a system dump and examine the system trace entries as you reference data...you'll see a very distinct pattern of page faults, I/O and resumption of your program, and hopefully it will all make sense to you.
From the manual
,pfcount
Specifies the number of additional blocks you want window services to bring into the window each time your program references data that
is not already in the window. The number you specify is added to the
minimum of one block that window services always brings in. That is,
if you specify a value of 20, window services brings in up to 21. The
number of additional blocks ranges from zero through 255.
Note that you get 1 block without asking.

Measure the function-level memory usage in Scala in my Application Code during runtime

First of all, this is not an "off-line" profiling task!
I am working on some SCala codebase, and currently what I am trying to do is, if a function foo consumes too many memory (let's say over 10G), kill this function and return a default value.
So it should look like:
monitor{
foo() <--- if foo has used over 10G memory, just cut it off
}
catch {
case MemoryUsageError => default_value
}
Note that currently foo is running in the same process with my main function.
Is it possible to do so? I quickly googled such materials and only find a way to show the current memory usage of a SCala application; it is not as fine-grained as what I am looking for.
Am I clear on this? Could anyone shed some lights here? Thanks a lot!
========================================================================
Note that what I am looking for is an "online" method! It is not like off-line profiling. My application ifself should determine the memory usage of foo function, and if it goes too high, just cut it off.
Is it possible?
In general jvm doesn't track creator of objects allocated on heap and place of creation. This is very costly and doesn't matter for GC.
How to live with it
Termination
Self-controlled program. If you want terminate some continuous computation then computation shouldn't be continuous. What you need is check points where condition could be validated. For example, every start of iteration in a loop or at the beginning of every recursive call. Obviously computation could consist of several different stages instead of simple loop but approach is the same.
Separation of computation and control. For example, execute function as Future with predetermined Thread and interrupt it if needed or using ForkJoinTask and cancel() method.
Measurement
Usually only one or couple of classes fulfill most of the memory. If instances are about the same size then memory control could be implemented with counter of objects. Classes of 'heavy' objects could be find by inspection of algorithm or using jvisualvm. Increase counter during instance creation. Decrement is harder. Update counter when references are released (count instances that couldn't be removed by GC) or use PhantomReference (count all instances existed in VM). But don't use finalize()!
Second method is java instrumentation package. It allows to measure objects size (probably there are methods determining consumption of all objects of certain class). Also you could try measuring available memory. The flaw is you measure objects of not certain function but all of them.
For time control write down timestamp at the beginning of computation and measure duration at every check point.

How to implement deterministic single threaded network simulation

I read about how FoundationDB does its network testing/simulation here: http://www.slideshare.net/FoundationDB/deterministic-simulation-testing
I would like to implement something very similar, but cannot figure out how they actually did implement it. How would one go about writing, for example, a C++ class that does what they do. Is it possible to do the kind of simulation they do without doing any code generation (as they presumeably do)?
Also: How can a simulation be repeated, if it contains random events?? Each time the simulation would require to choose a new random value and thus be not the same run as the one before. Maybe I am missing something here...hope somebody can shed a bit of light on the matter.
You can find a little bit more detail in the talk that went along with those slides here: https://www.youtube.com/watch?v=4fFDFbi3toc
As for the determinism question, you're right that a simulation cannot be repeated exactly unless all possible sources of randomness and other non-determinism are carefully controlled. To that end:
(1) Generate all random numbers from a PRNG that you seed with a known value.
(2) Avoid any sort of branching or conditionals based on facts about the world which you don't control (e.g. the time of day, the load on the machine, etc.), or if you can't help that, then pseudo-randomly simulate those things too.
(3) Ensure that whatever mechanism you pick for concurrency has a mode in which it can guarantee a deterministic execution order.
Since it's easy to mess all those things up, you'll also want to have a way of checking whether determinism has been violated.
All of this is covered in greater detail in the talk that I linked above.
In the sims I've built the biggest issue with repeatability ends up being proper seed management (as per the previous answer). You want your simulations to give different results only when you supply a different seed to your random number generators than before.
After that the biggest issue I've seen seems tends to be making sure you don't iterate over collections with nondeterministic ordering. For instance, in Java, you'd use a LinkedHashMap instead of a HashMap.

EF - multiple includes to eager load hierarchical data. Bad practice?

I am needing to eager load a hierarchy structure so that I can recursively iterate through it. The eager loading is necessary to prevent multiple db queries while traversing the tree. It seems the consensus is that you can't eager load infinite levels of the tree, so I did something like
var item= db.ItemHierarchies
.Include("Children.Children.Children.Children.Children")
.Where(x => x.condition == condition)
to load 5 levels of children. This seems to get the job done. I'm wondering what the drawback is to doing this? If there is none then theoretically could I add 50 levels of includes here without slowing things down?
I recommend taking a look at the SQL that is generated as you add eager loading to your query.
var item= db.ItemHierarchies
.Include("Children")
.Include("Children.Children")
.Include("Children.Children.Children")
.Include("Children.Children.Children.Children")
.Include("Children.Children.Children.Children.Children")
var sql = ((System.Data.Objects.ObjectQuery) item).ToTraceString()
// http://visualstudiomagazine.com/blogs/tool-tracker/2011/11/seeing-the-sql.aspx
You'll see that the SQL quickly gets very big and complicated and can potentially have serious performance implications. You'd do well to limit your eager loading to data that you are certain you will need and to consider using explicit loading for some of the related entities - especially if you're working with connected entities in which case you can explicitly load collection properties when they're needed.
Also note that you may not need multiple separate Includes. For example, the following needs to be separate Includes because they're addressing separate properties (Widgets and Spanners) of the root.
var item= db.ItemHierarchies
.Include("Widgets")
.Include("Spanners.Flanges")
But the following isn't necessary:
var item= db.ItemHierarchies
.Include("Widgets") //This isn't necessary.
.Include("Widgets.Flanges") //This loads both Widges and Flanges.
Well honestly.. It's an extremely bad practice.
Let's assume you had 50 objects in your root.. and 50 per level.
You may end up retrieving 312500000 "capsules" of information.
Now, one might ask: "So what is wrong with that?!",
I mean if that is what is required than why not do that..
Rule #1: we develop software that should be used by human beings.
And the fact is that no human capable of taking a glimpse at 312500000 items of information at once and learn or conclude something beneficial out of it. (except.. that it does not help him or her to watch it)
Rule #2: UI should be based on what is needed and not what is possible.
And since we already established that showing 312500000 capsules of data is not needed there is no reason to bring all that at once.
And now you might come forward and say - But I don't care about the UI, really! All I need is to iterate in that data in order to process some information!
In that case you would probably want to save your results somewhere for future reference, but that means that its a batch job.. so why not apply batch job rules upon it.. like process it item by item which will also may give you the benefit of splitting it between even more machines if needed.
So you see.. no matter which path you choose there should be no reason to do it.
(= definition of what is a bad practice.)
Update:
After reading interesting concerns in the comments, I would like to update this answer with more analysis:
Deciding what is a bad practice must always be in reference to what is to be achieved or what is the role of each part in the system. In the current situation (after reading the comments) it has been brought or implied that the data storage is actually a persistent medium for objects opposed to a different concept where the data is the 'heart' of the application.
We can define two data types:
1) Data-Center which is being used in data-centric applications such as banks, CRM, ERP, websites or other service based solutions.
VS.
2) Data-Persistence medium which is being used as data to be saved for when the application is not active, in example: any simple app save file or any game save file and etc.
The main difference is that a data persistence medium is to be accessed only by a single instance of the app at a single point in time.. meaning the data is not designed to be shared by many instances. if the data is to be shared - we are dealing with a data-center application.
If your app just need a data-persistence medium - loading all the information cannot be considered as a bad practice - but you still need to make sure you are not exploding the memory. and in that frame of work, SQL Server might not be what you need or the best tool to use.
In the other case of Data-Centric application - my original answer remains as it will be a bad practice to bring all the information per instance of the application.