What exactly is persistence? - persistence

I've searched all over the web for definitions, but I'm still confused. I've narrowed it all down to two different defintions:
"A data structure is persistent if it support saccess to multiple version" and "Persistence is the ability of an object to survive the lifetime of the OS process in which it resides".
To me, these mean different things, but maybe I'm just not getting it. Could someone please explaing to me in a basic way what exactly persistence means?

This word means different things in different contexts:
Persistent data structures create new copies of themselves to incorporate changes (all versions are accessible AND modifiable at any time).
Persistence in your second example refers to the ability of objects to be stored in non-volatile memory, such as a hard disk. Otherwise, they would be destroyed when the OS ends its session.

Related

How to delete memory usage during an Experiment?

I am constructing an experiment in Anylogic, which saves data in the Parameter variation tab under a custom-class list. The model needs to perform a lot of simulations, and repetitions to optimize for Setting variables in the model itself. After x amount of iterations, I use a Python connector to run some code in finding new possible parameters for the underlaying model.
The problem I am having right now, is that around Simulation-run number 200, the memory usage is maximum (4Gb), and it proceeds to run super-slow. I have found some interesting ways to cut on memory usage, but I believe there is only one thing that could help me right now: let the system delete memory that is used for past iterations. After each iteration, the data of a simulation is stored, so I am fine with anylogic deleting the logs of the specific simulation afterwards.
Is such a thing possible? If so, how can I implement that?
Java makes use of a Garbage collector to manage memory usage and you have no control over it. How it works is that every now and then, based on some internal logic, it will collect and remove all instances of classes in memory that do not contain any active references and remove them.
Thus to reduce memory you must ensure that any instances that are no longer needed are not referenced by any of the objects currently active in your model.
To identify these you must use a Java profiler like JProfiler, or some of the free alternatives - see here for more.
This will show you exactly what classes are using up all your memory and with some deep diving you should be able to identify who is keeping reference to them.

IBM Window Services (DWS) csrevw function on MVS

I'm working on IBM MVS (z/OS) and trying to make Window Services working.
On the function CSREVW I don't understand what the purpose of the parameter pfcount.
Acording to the documentation this will ask to the window services to read more than one block after my program references a block that is not in my window.
But how the window services is suposed to know that I tried to reference data that are not in my window? I mean, it can't know that I'm reading data out of my window if i don't call CSREVW or CSRVIEW again.
Maybe my major issue is that I have trouble to understand english but this seems clear to me...
Here is the link to the documentation, this is explained at pages 23-24 :
http://publibz.boulder.ibm.com/epubs/pdf/iea3c102.pdf
I know this is a very specific problem about an IBM service and I apologize about that.
Thank you !
Tim
I think the problem you're having is that you need to understand a little bit about how the underlying objects behind the windowing service work in virtual storage.
At the core, the various windowing services work to give you what amounts to a "private" page dataset. You allocate and reference storage, but the objects in that virtual space aren't really in memory - the system's page fault mechanism brings them in as you reference them. So yes, you're accessing data within a "window", but in reality, the data you expect to see may not be "paged in" at that moment.
Going a little deeper, when you first allocate the object, the virtual storage it's mapped to has all of the pages marked "invalid" in the underlying page table entries. That means that as soon as you touch this storage, a page fault interrupt occurs. At this point, the operating system steps in and resolves the page fault by brining the necessary data into memory, then your program continues, oblivious to all of this processing on your behalf. You're correct that you're just referencing data within the window, but there's a lot under the covers going on to support this.
This is where PFCOUNT comes in...
Let's say you have structures that are, say, 64K long inside your virtual window. It would be sloppy and slow to reference each page of this structure and cause a page fault each time. Much better would be to use PFCOUNT to cause the page you reference and all 15 other pages needed by your object to be paged-in with a single operation. Conversely, if your data was small and you were highly random about how you access it, PFCOUNT isn't going to help you - the next page you reference could be anywhere, and it's actually wasteful to have a large PFCOUNT since you end up bringing in a lot of data you never use.
Hope that makes sense - if you'd like a challenge, take yourself a system dump and examine the system trace entries as you reference data...you'll see a very distinct pattern of page faults, I/O and resumption of your program, and hopefully it will all make sense to you.
From the manual
,pfcount
Specifies the number of additional blocks you want window services to bring into the window each time your program references data that
is not already in the window. The number you specify is added to the
minimum of one block that window services always brings in. That is,
if you specify a value of 20, window services brings in up to 21. The
number of additional blocks ranges from zero through 255.
Note that you get 1 block without asking.

Akka and singleton actors

I've recently started messing around with akka's actors and http modules. However I've stumbled upon a rather annoying little quirk, namely, creating singelton actors.
Here are two examples:
1)
I have an in-memory cache, my service is quite small (its an app rather) so I really like this in memory model. I can hold most information relevant to the user in a Map (well, a map of lists, but still, quite an easy to reason about structure) and I don't get the overhead and complexity of a redis, geode or aerospike.
The only problem is that this in-memory chache can be modified, by multiple sources and said modifications must be synchronous. Instead of synchornizing all 3 acess methods for this structure (e.g. by building a message queue or implementing locks) I thought I'd just wrap the structure and its access methods into an actor, build in message queue, easy receive->send logic and if things scale up it will be very easy to replace with a DA actors over a dedicated in memory db.
2) I have a "Service" layer that should be used to dispatch actors for various jobs (access the database, access the in-memory cache, do this computation with data and deliver the result to the user... etc).
It makes sense of this Service layer to be a "singleton" of sorts, a closure over some functions, since it does nothing that's blocking or cpu/memory intensive in any way, it simply assigns tasks further down the line (e.g. decides how many actors/thread/w.e should be created and where a request should go)
However, this thing would require either:
a) Making both object singleton actors or
b) Making both objects actual "objects"(as in the scala object notation that designates a single named singleton with functions that have closures over its scope)
There are plenty of problems with b), namely that the service layer will either have to get an actors system "passed" to it (and I'm not sure that's a best practice) in order o create actors, rather than creating its own "childrens" it will create children's using the global actors system and the messaging and monitoring logic will be a lot more awkward and unintuitive. Also, that the in-memory cache will not have the advantage of the built in message que (I'm not saying its hard to implement one, but this seems like one of those situation where one goes "Oh, jolly, its good that I have actors and I don't have to spend time implementing and testing this code")
a) seems to have the problem of being generally speaking poorly documented and unadvised in the akka documentation. I mean:
http://doc.akka.io/docs/akka/2.4/scala/cluster-singleton.html
Look at this shit, half of the docs are warning against using it, it was its own dependency and quite frankly its very hard to read for a poor sod like me which hasn't set foot in the functional&concurrent programming ivory tower.
So, ahm. Could any of you guys explain to me why its bad to use singleton actors ? How do you design singletons if they can't be actors ? Is there any way to design singleton actors that won't cause a lot of damage down the line ? Is the whole "service" model of having "global" services that are called rather than instantiated "un akka like" ?
Just to clarify the documentation, they're not warning against using it. They're warning that there are circumstances in which using a singleton will cause problems, which are expected given the circumstances. They mention the following situations:
If the singleton is a performance bottleneck. This makes sense. If everything relies on a single object that does work slowly, everything will be slow.
If the actor needs to be non-stop available, you'll run into problems if the singleton ever goes down, because those messages can't just be handled by another instance. It will take some amount of time to re-start the singleton before its work can be resumed.
The biggest problem happens if you have auto-downing turned on. Auto-downing is a policy by which an unreachable node is assumed to be down, and removed from the network. If you do this, but the node is not actually down but just unreachable due to a network partition, both sides of the partition will decide that they're the surviving nodes and create their own singletons. So now you have two singletons. Which is, of course, not what you want from a singleton. But you should never use auto-downing outside of testing anyway. It's a terrible recovery strategy that was included for completeness and convenience in testing.
So I don't read that as recommending against using it. Just being clear about the expected pitfalls if you do use it, based on the nature of the structure.

How do you access the storage map for a cache class programmatically?

I'm trying to create a method that will automatically create something I can save and use later, which will let me automate data migration from one version of a class to the next.
When I compile a normal persistent class, a "storage" gets created, and I would like to be able to archive some code that would let me recreate this "old storage" so I could access an "older version" from a global and map the values into the current mapping of the class. I have objectgenerator methods that will save a version with every record, and detect if the "current version" of the class code is different from the version saved in the data record itself, but I'm not sure how to retain what the "old" version actually looked like so I can auto-migrate the data from the "old to the new". To do this, I'm thinking that being able to read what the current storage is at compile time, I should be able to save that value elsewhere and create a durable "version reader" so I can migrate the data forward without having to actually have the programmer do the work.
Does this seem like a reasonable approach? and if so, can anyone point me to where, at compile time, the storage values are so I can do the saving of the data?
(or should I be looking at cloning some part of the generated %Save() method chain at compile time to a version specific save attached to "something else" (not sure what, but 'something'))
This question and answer originally appeared in the InterSystems Developer Community https://community.intersystems.com/post/how-do-you-access-storage-map-cache-class-programmatically
In a database, your most important asset is your data -- business logic and code comes a close second. There should at least be someone paying attention to the data slots for sanity, even though in most cases, there is nothing to do. At the very least, any developer adding or modifying properties should "diff" the storage definition when submitting code to the source repository. The class compiler by default will handle everything correctly.
The simplest and best way to move from one version of a class to the next is to keep the same storage definition. The class compiler will create slots for any new properties and is completely non-destructive and conservative. When I retire properties, I generally manually rename the storage slot and give it a prefix such as zzz. This allows me to explicitly clean up and reuse these slots later if I so choose.
For the type of before/after compile triggers you are looking for, I would use the RemoveProjection/CreateProjection methods of a projection class which provide "a way to customize what happens when a class is compiled or removed".
You can use %Dictionary.CompiledStorage and related classes(Data) to have full access to the compiled storage definition of a class.
3a. An alternate approach is to use XSLT or XML parsing to read the storage definition from the exported class definition. I would only use this alternative if you need to capture details for separate source control purposes.
The simplest storage definitions use $ListBuild slots and a single global node. When subclasses and collection properties are used, the default storage definition gets more complicated -- that is where you will really want to stick to the simple approach (item 1).

EF - multiple includes to eager load hierarchical data. Bad practice?

I am needing to eager load a hierarchy structure so that I can recursively iterate through it. The eager loading is necessary to prevent multiple db queries while traversing the tree. It seems the consensus is that you can't eager load infinite levels of the tree, so I did something like
var item= db.ItemHierarchies
.Include("Children.Children.Children.Children.Children")
.Where(x => x.condition == condition)
to load 5 levels of children. This seems to get the job done. I'm wondering what the drawback is to doing this? If there is none then theoretically could I add 50 levels of includes here without slowing things down?
I recommend taking a look at the SQL that is generated as you add eager loading to your query.
var item= db.ItemHierarchies
.Include("Children")
.Include("Children.Children")
.Include("Children.Children.Children")
.Include("Children.Children.Children.Children")
.Include("Children.Children.Children.Children.Children")
var sql = ((System.Data.Objects.ObjectQuery) item).ToTraceString()
// http://visualstudiomagazine.com/blogs/tool-tracker/2011/11/seeing-the-sql.aspx
You'll see that the SQL quickly gets very big and complicated and can potentially have serious performance implications. You'd do well to limit your eager loading to data that you are certain you will need and to consider using explicit loading for some of the related entities - especially if you're working with connected entities in which case you can explicitly load collection properties when they're needed.
Also note that you may not need multiple separate Includes. For example, the following needs to be separate Includes because they're addressing separate properties (Widgets and Spanners) of the root.
var item= db.ItemHierarchies
.Include("Widgets")
.Include("Spanners.Flanges")
But the following isn't necessary:
var item= db.ItemHierarchies
.Include("Widgets") //This isn't necessary.
.Include("Widgets.Flanges") //This loads both Widges and Flanges.
Well honestly.. It's an extremely bad practice.
Let's assume you had 50 objects in your root.. and 50 per level.
You may end up retrieving 312500000 "capsules" of information.
Now, one might ask: "So what is wrong with that?!",
I mean if that is what is required than why not do that..
Rule #1: we develop software that should be used by human beings.
And the fact is that no human capable of taking a glimpse at 312500000 items of information at once and learn or conclude something beneficial out of it. (except.. that it does not help him or her to watch it)
Rule #2: UI should be based on what is needed and not what is possible.
And since we already established that showing 312500000 capsules of data is not needed there is no reason to bring all that at once.
And now you might come forward and say - But I don't care about the UI, really! All I need is to iterate in that data in order to process some information!
In that case you would probably want to save your results somewhere for future reference, but that means that its a batch job.. so why not apply batch job rules upon it.. like process it item by item which will also may give you the benefit of splitting it between even more machines if needed.
So you see.. no matter which path you choose there should be no reason to do it.
(= definition of what is a bad practice.)
Update:
After reading interesting concerns in the comments, I would like to update this answer with more analysis:
Deciding what is a bad practice must always be in reference to what is to be achieved or what is the role of each part in the system. In the current situation (after reading the comments) it has been brought or implied that the data storage is actually a persistent medium for objects opposed to a different concept where the data is the 'heart' of the application.
We can define two data types:
1) Data-Center which is being used in data-centric applications such as banks, CRM, ERP, websites or other service based solutions.
VS.
2) Data-Persistence medium which is being used as data to be saved for when the application is not active, in example: any simple app save file or any game save file and etc.
The main difference is that a data persistence medium is to be accessed only by a single instance of the app at a single point in time.. meaning the data is not designed to be shared by many instances. if the data is to be shared - we are dealing with a data-center application.
If your app just need a data-persistence medium - loading all the information cannot be considered as a bad practice - but you still need to make sure you are not exploding the memory. and in that frame of work, SQL Server might not be what you need or the best tool to use.
In the other case of Data-Centric application - my original answer remains as it will be a bad practice to bring all the information per instance of the application.