Writing to/loading from a file vs. storing in GUI appdata - matlab

I am doing calculations on an image and intend to save a variable to a .mat file. When accessing the variable, would it be faster for me to load from the file or store the variable to the GUI appdata?
Also, I noticed that when I originally didn't save the variable (89x512x512 double array), it ran much faster. Is saving to a file generally time expensive?

You already have that array in memory - so storing it via setappdata/getappdata is certainly the faster alternative and, given the moderate size doesn't have any real drawback.
So, no reason to store it to a file, imho.
Any yes, writing stuff to a file is comparingly slow and i.e. takes a certain minimum amount of time not matter how tiny your data is.

Related

Memory issues with large amounts of data stored as nested cells in MATLAB

I have large amounts of data stored as nested cells in .mat files. My biggest problem right now is the load times for accessing these files, but I'm wondering if the underlying problem is that I came up with an inefficient way for storing the data and I should restructure it to be smaller.
The full file consists of a cell aray:
Hemi{1,h} where there are 52 versions of h
.{n,p} where there are 85 versions of n and up to ~100 versions of p
.Variable where there are 10 variables, each with ~2500 values
This full file ate up all my memory, so I saved it in parts, aka:
Hemi1.mat=Hemi{1,1}
Hemi2.mat=Hemi{1,2}
etc.
The next step for this application is to load each file, determine which part of it is an appropriate solution (I need Hemi{1,h}.{n,p}.Var1, Hemi{1,h}.{n,p}.Var2, and Hemi{1,h}.{n,p}.Var3 for this, but I still need to keep track of the other Variables), save the solution, then close the file and move to the next one.
Is there a faster way to load these files?
Is the problem less my dataset and more how I've chosen to store it? Is there a better alternative?
That is quite a lot of data. I have a few suggestions that you could look into. The first is to see if you can change the datatypes to something like Categorical Objects. They are way more memory efficient. Also if you are storing strings as your final data this can be quite heavy storage wise.
Second you could look into HDF5 file storage. I hear it is a nice way to store structured data.
You could finally try to convert your {n,p} arrays into Table structures. I am not sure if this is better for memory, but tables are nice to work with and it may help you out. (Depending on your version of Matlab you may not have tables :P).
I hope this helps!
-Kyle

NetLogo BehaviorSpace memory size constraint

In my model I'm using behaviour space to carry out a number of runs, with variables changing for each run and the output being stored in a *.csv for later analysis. The model runs fine for the first few iterations, but quickly slows as the data grows. My questions is will file-flush when used in behaviour space help this? Or is there a way around it?
Cheers
Simon
Make sure you are using table format output and spreadsheet format is disabled. At http://ccl.northwestern.edu/netlogo/docs/behaviorspace.html we read:
Note however that spreadsheet data is not written to the results file until the experiment finishes. Since spreadsheet data is stored in memory until the experiment is done, very large experiments could run out of memory. So you should disable spreadsheet output unless you really want it.
Note also:
doing runs in parallel will multiply the experiment's memory requirements accordingly. You may need to increase NetLogo's memory ceiling (see this FAQ entry).
where the linked FAQ entry is http://ccl.northwestern.edu/netlogo/docs/faq.html#howbig
Using file-flush will not help. It flushes any buffered data to disk, but only for a file you opened yourself with file-open, and anyway, the buffer associated with a file is fixed-size, not something that grows over time. file-flush is really only useful if you're reading from the same file from another process during a run.

lazy evaluation of encrypted text stored in large NSArray

I have to store about 10k text lines in an Array. Each line is stored as a separate encrypted entry. When the app runs I only need to access a small number and decrypt them - depending on user input. I thought of some kind of lazy evaluation but don't know how to do it in this case.
This is how I build up my array: [allElements addObject: #"wdhkasuqqbuqwz" ] The string is encrypted. Accessing is like txt = [[allElements objectAtIndex:n] decrypt]
The problem currently is that this uses lots of memory from the very start - most of the items I don't need anyway, just don't know which ones ;). Also I am hesitant to store the text externally eg in a textfile, since this would make it easier to access it.
Is there a way to minimize memory usage in such a case?
ps initialization is very fast, so no issue here
So it's quite a big array, although not really big enough to be triggering any huge memory warnings (unless my maths has gone horribly wrong, I reckon your array of 10,000 40-character strings is about 0.76 MB. Perhaps there are other things going on in your app causing these warnings - are you loading any large images or many assets?
What I'm a little confused about it how you're currently storing these elements before you initalise the array. Because you say you don't want to store the text externally in a text file, but you must be holding them in some kind of file before initialising your array, unless of course your values are generated on the fly.
If you've encrypted correctly, you shouldn't need to care whether your values are stored in plain-sight or not. Hopefully you're using an established standard and not rolling your own encryption, so really I think worrying about users getting hold of the file is a moot point. After all, the whole point of encryption is being able to hide data in plain sight.
I would recommend, as a couple of your commenters already have, is that you should just use some form of database storage. Core Data was made for this purpose - handling large amounts of data with minimal memory impact. But again, I'm not sure how that array alone could trigger a memory warning, so I suspect there's other stuff going on in your app that's eating up your memory.

What is the Storable module used for?

I am having a hard time understanding what Storable does.
I know that it "stores" a variable into your disk, but why would I need to do that? What would I use this module for, and how would I do it?
Reasons that spring to mind:
Persist memory across script calls
Sharing variables across different processes (sometimes it isn't possible to pipe stuff)
Of course, that's not all that Storable does. It also:
Makes it possible to create deep clones of data structures
Serializes the data structure stored, which implies a smaller file footprint than output from Data::Dump
Is optimized for speed (so it's faster to retrieve than to require a file containing Data::Dump output
One example:
Your program spends a long time populating your data structure, a graph, or trie, and if the program crashes then you'd lose it all and have to start again from square one. To avoid losing this data and be able to continue where it stopped last time you can save a snapshot of the data to a file manually or just simply use Storable.

Efficient disk access of large number of small .mat files containing objects

I'm trying to determine the best way to store large numbers of small .mat files, around 9000 objects with sizes ranging from 2k to 100k, for a total of around half a gig.
The typical use case is that I only need to pull a small number (say 10) of the files from disk at a time.
What I've tried:
Method 1: If I save each file individually, I get performance problems (very slow save times and system sluggishness for some time after) as Windows 7 has difficulty handling so may files in a folder (And I think my SSD is having a rough time of it, too). However, the end result is fine, I can load what I need very quickly. This is using '-v6' save.
Method 2: If I save all of the files in one .mat file and then load just the variables I need, access is very slow (loading takes around three quarters of the time it takes to load the whole file, with small variation depending on the ordering of the save). This is using '-v6' save, too.
I know I could split the files up into many folders but it seems like such a nasty hack (and won't fix the SSD's dislike of writing many small files), is there a better way?
Edit:
The objects are consist mainly of a numeric matrix of double data and an accompanying vector of uint32 identifiers, plus a bunch of small identifying properties (char and numeric).
Five ideas to consider:
Try storing in an HDF5 object - take a look at http://www.mathworks.com/help/techdoc/ref/hdf5.html - you may find that this solves all of your problems. It will also be compatible with many other systems (e.g. Python, Java, R).
A variation on your method #2 is to store them in one or more files, but to turn off compression.
Different datatypes: It may also be the case that you have some objects that compress or decompress inexplicably poorly. I have had such issues with either cell arrays or struct arrays. I eventually found a way around it, but it's been awhile & I can't remember how to reproduce this particular problem. The solution was to use a different data structure.
#SB proposed a database. If all else fails, try that. I don't like building external dependencies and additional interfaces, but it should work (the primary problem is that if the DB starts to groan or corrupts your data, then you're back at square 1). For this purpose consider SQLite, which doesn't require a separate server/client framework. There is an interface available on Matlab Central: http://www.mathworks.com/matlabcentral/linkexchange/links/1549-matlab-sqlite
(New) Considering that the objects are less than 1GB, it may be easier to just copy the entire set to a RAM disk and then access through that. Just remember to copy from the RAM disk if anything is saved (or wrap save to save objects in two places).
Update: The OP has mentioned custom objects. There are two methods to consider for serializing these:
Two serialization program from Matlab Central: http://www.mathworks.com/matlabcentral/fileexchange/29457 - which was inspired by: http://www.mathworks.com/matlabcentral/fileexchange/12063-serialize
Google's Protocol Buffers. Take a look here: http://code.google.com/p/protobuf-matlab/
Try storing them as blobs in a database.
I would also try the multiple folders method as well - it might perform better than you think. It might also help with organization of the files if that's something you need.
The solution I have come up with is to save object arrays of around 100 of the objects each. These files tend to be 5-6 meg so loading is not prohibitive and access is just a matter of loading the right array(s) and then subsetting them to the desired entry(ies). This compromise avoids writing too many small files, still allows for fast access of single objects and avoids any extra database or serialization overhead.