Efficient way to store single matrices generated in a loop in Matlab? - matlab

I would like to know whether there is a way to reduce the amount of memory used by the following piece of code in Matlab:
n=3;
T=100;
r=T*2;
b=80;
BS=1000
bsuppostmp_=cell(1,BS);
bslowpostmp_=cell(1,BS);
bsuppnegtmp_=cell(1,BS);
bslownegtmp_=cell(1,BS);
for w=1:BS
bsuppostmp_{w}= randi([0,1],n*T,2^(n-1),r,b);
bslowpostmp_{w}=randi([0,3],n*T,2^(n-1),r,b);
bsuppnegtmp_{w}=randi([0,4],n*T,2^(n-1),r,b);
bslownegtmp_{w}=randi([0,2],n*T,2^(n-1),r,b);
end
I have decided to use cells of matrices because after this loop I need to call separately each single matrix in another loop.
If I run this code I get the message error "Your system has run out of application memory".
Do you know a more efficient (in terms of memory) way to store each single matrix?

Let's refer the page about Strategies for Efficient Use of Memory:
Because simple numeric arrays (comprising one mxArray) have the least overhead, you should use them wherever possible. When data is too complex to store in a simple array (or matrix), you can use other data structures.
Cell arrays are comprised of separate mxArrays for each element. As a result, cell arrays with many small elements have a large overhead.
I doubt that the overhead for cell array is really large ...
Let me give a possible explanation. What if Matlab cannot use the swap file in case of storing the 4D arrays into a cell array? When storing large numeric arrays, there is no out-of-memory error because Matlab uses the swap file for caching each variable when the used memory becomes too big. Whereas if each 4D array is stored in a super cell array, Matlab sees it as a single variable and cannot fragment it part in the RAM and part in the swap file. Ok I don't work at Mathworks so I don't know if I'm right or not, it's just an idea about this topic so I would be glad to know what is the real reason.
So my advice is the same as other comments: try to free matrices as soon as you've done with them. There is not so many possibilities to store many dense arrays: one big array (NOT recommended here, will reach out-of-memory sooner because Matlab makes it contiguous), cell array or struct array (and if I correctly understand the documentation, the overhead can be equivalent). In all cases, the data amount over all 4D arrays is really large, so the best thing to do is to care about keeping the memory constantly as low as possible by discarding some data once they are used and keep in memory only the results of computation (in case they take lower memory usage ...).

Related

Multidimensional array or cell array in Matlab?

I want to create a multidimensional array A in Matlab of dimension NxMxG with N,M,G very large (e.g. 10^6).
Then I need to access Ain a loop as
for g=1:G
Atemp=A(:,:,g);
%etc etc
end
What is more convenient in terms of speed and memory between storing the values of A in a multidimensional array or in a cell array?
If you always loop on slices in the same way, and process them one at a time, as your bit of code seem to suggest, then the performance should be roughly equivalent.
If you really intend to store 1e6x1e6x1e6 double's, Matlab is definitely probably not your tool. However, if slices are sparse, then it's probably a bit more efficient to store them as a cell array, so Matlab does not have to search the full 3D space when "cutting" the slice, and Atemp=A{g}; simply copies a sparse matrix.
If you are working on full (nonsparse) slices then probably you should load/save your slice to disk and use instead a function/support class which loads from file: Atemp=A(g);. Mind that text loading takes up much more time than loading a binary file: so choose your file format carefully!
If you use numbers, a multidimensional array is the right thing to use. A cell array also allows other types, so is less optimised for numbers only. Because you are using very large arrays, maybe a sparse matrix may be appropriate for you.
First, note that neither pick will let you handle 10^18 values. You don't have exabytes of storage, let alone memory.
If you will ONLY ever use it as Atemp = A(:,:,g); with N and M always the same size for all g, having it multi-dimensional or cell shouldn't change anything meaningful as far as performance goes. N-D will be probably a bit faster, but nothing significant.
Obviously, if you ever want to have computation with different sizes of N, M depending on g, you need to pick cell array. And if you want to have computation with say Atemp = squeeze(A(:,g,:)); N-D array is clear choice here.
So, choice most likely depends if you prefer doing A(:,:,g) or A{g};, which depends on your meaning of data. Say if you have weather data and currently only care about what happens at specific height (not what happens between the layers), A(:,:,g) is clearly more sensible. It is possible you will require inter-layer calculations at some point. But if you have instead g meaning different measurement sites gathering data, A{g} should be used to pick the site. You will likely have some sites larger or smaller eventually.

matlab cell array size much larger than actual data

I recently discovered an strange behavour of MATLAB cell arrays that was not happening before.
If I create a cell array with
a=cell(1,4)
its size is 32 bytes.
If then I put something inside, e.g.
a{2}='abcd'
its size becomes 144 bytes. But if I remove this content by putting
a{2}=[]
the size becomes 132 bytes and so on. What is the problem?
Simply put, the Matlab cell array needs some internal data structures to keep track of what is stored within.
As it seems, Matlab allocates memory as needed, and thus extends the storage needed by the cell array as you insert data.
Removing the data doesn't mean that matlab can return the now unused memory to the OS or internal memory pool -- that might either be something that is impossible with the internal storage structure, or something that would be unwise with respect to performance, because cell arrays from which data is removed are (speaking over all use cases of cell arrays) be structures that get updated often, so that "prematurely" returning memory just to acquire it back again a few instructions later would be pretty CPU-intense.
As a general note: Matlab has pretty terrible storage approaches for nearly everything but matrices and sparse matrices (vectors of course being special cases of matrices). That's because it's not Matlab's job to be e.g. a string parser etc.
If memory becomes a problem, it might be worth considering implementing the math core of your problem in Matlab and doing the rest in other, more generally usable programming languages and somehow interfacing your Matlab code with that -- I haven't tried that myself, but Mathworks has a Matlab engine for python, and I'd take writing python for things like storing arbitrary data over using Matlab every day; with that engine, you can call Matlab to do your dirty math work, and use python to do your everyday scripting/programming work.
Notice that my bottom line here is that Matlab has great Math routines and impressive documentation, but if you want to actually develop software, using a general purpose tool/language is much more likely to be satisfying quickly.
I'd even go as far as saying that it's probably worth your time to learn python, just to be able to circumvent having to deal with things that Matlab wasn't designed for (and cell arrays are a prime example of what Matlab is really complicated about and what's extremely easy in python).
You use
a{2}=[]
to 'kill' the data in that field. In reality you actually do access the data, that is you leave a non-empty cell entry with an empty double array. (Thanks to matlab for representing empty cells as empty doubles...)
but if you use (no curly braces, but parentheses):
a(2) = cell(1,1)
then the cell array size is back to "empty" = 32 bytes.

Matlab memory management

I am simulating a set of differential equations in Matlab, for which I will save a struct of at least 400 x 80 000 x 24 doubles.
What in your opinion is the easiest way to control memory load? Memory mapping or a parallel process for memory checking, data writing and clearing? The program is single thread, but potentially will be re-written for a parallel computing.
Problem statement
Here are two problems, I think you are facing one of these:
Your data is really a block of mxnxz
Your data is not really a block of mxnxz
Solution for 1
If your data is really a block form, it is likely the best solution to store your data in a matrix.
Solution for 2
If your data is not a nice block, there are some choices to be made.
If your data is nearly a nice block, (for example mx(0.99~1.01)nxz, still consider using a matrix. Think about padding the gaps with zeros or NaN values.
If your data is very much not a block, (for example mx(0.01~100)nxz, consider using a more flexible data structure.
On flexible data structure usage
The trick to using data in a flexible way, is to try to identify the big matrices (that may vary in size) and let those be regular matrices. In your case the data is about 400 x 80000 x 24, hence you will definitely want the 80000 to be the dimension of a simple storage structure. The 24 and 400 are quite small so we don't care whether they are flexible or not.
Conclusion
The most efficient data structure that is not very flexible is a matrix of 400x80000x24
The most flexible data structure that is still fairly efficient is a cell array of 400x24, that contains vectors of approximately 80000x1
The most efficient data structure that is still fairly flexible is a cell array of 1x24 that contains matrices of approximately 400x80000. As 24 is small you could even use a struct for this in a meaningfull way, but usually a cell array will make more sense.

How is a bitmapped vector trie faster than a plain vector?

It's supposedly faster than a vector, but I don't really understand how locality of reference is supposed to help this (since a vector is by definition the most locally packed data possible -- every element is packed next to the succeeding element, with no extra space between).
Is the benchmark assuming a specific usage pattern or something similar?
How this is possible?
bitmapped vector tries aren't strictly faster than normal vectors, at least not at everything. It depends on what operation you are considering.
Conventional vectors are faster, for example, at accessing a data element at a specific index. It's hard to beat a straight indexed array lookup. And from a cache locality perspective, big arrays are pretty good if all you are doing is looping over them sequentially.
However a bitmapped vector trie will be much faster for other operations (thanks to structural sharing) - for example creating a new copy with a single changed element without affecting the original data structure is O(log32 n) vs. O(n) for a traditional vector. That's a huge win.
Here's an excellent video well worth watching on the topic, which includes a lot of the motivation of why you might want these kind of structures in your language: Persistent Data Structures and Managed References (talk by Rich Hickey).
There is a lot of good stuff in the other answers but nobdy answers your question. The PersistenVectors are only fast for lots of random lookups by index (when the array is big). "How can that be?" you might ask. "A normal flat array only needs to move a pointer, the PersistentVector has to go through multiple steps."
The answer is "Cache Locality".
The cache always gets a range from memory. If you have a big array it does not fit the cache. So if you want to get item x and item y you have to reload the whole cache. That's because the array is always sequential in memory.
Now with the PVector that's diffrent. There are lots of small arrays floating around and the JVM is smart about that and puts them close to each other in memory. So for random accesses this is fast; if you run through it sequentially it's much slower.
I have to say that I'm not an expert on hardware or how the JVM handles cache locality and I have never benchmarked this myself; I am just retelling stuff I've heard from other people :)
Edit: mikera mentions that too.
Edit 2: See this talk about Functional Data-Structures, skip to the last part if you are only intrested in the vector. http://www.infoq.com/presentations/Functional-Data-Structures-in-Scala
A bitmapped vector trie (aka a persistent vector) is a data structure invented by Rich Hickey for Clojure, that has been implementated in Scala since 2010 (v 2.8). It is its clever bitwise indexing strategy that allows for highly efficient access and modification of large data sets.
From Understanding Clojure's Persistent Vectors :
Mutable vectors and ArrayLists are generally just arrays which grows
and shrinks when needed. This works great when you want mutability,
but is a big problem when you want persistence. You get slow
modification operations because you'll have to copy the whole array
all the time, and it will use a lot of memory. It would be ideal to
somehow avoid redundancy as much as possible without losing
performance when looking up values, along with fast operations. That
is exactly what Clojure's persistent vector does, and it is done
through balanced, ordered trees.
The idea is to implement a structure which is similar to a binary
tree. The only difference is that the interior nodes in the tree have
a reference to at most two subnodes, and does not contain any elements
themselves. The leaf nodes contain at most two elements. The elements
are in order, which means that the first element is the first element
in the leftmost leaf, and the last element is the rightmost element in
the rightmost leaf. For now, we require that all leaf nodes are at the
same depth2. As an example, take a look at the tree below: It has
the integers 0 to 8 in it, where 0 is the first element and 8 the
last. The number 9 is the vector size:
If we wanted to add a new element to the end of this vector and we
were in the mutable world, we would insert 9 in the rightmost leaf
node, like this:
But here's the issue: We cannot do that if we want to be persistent.
And this would obviously not work if we wanted to update an element!
We would need to copy the whole structure, or at least parts of it.
To minimize copying while retaining full persistence, we perform path
copying: We copy all nodes on the path down to the value we're about
to update or insert, and replace the value with the new one when we're
at the bottom. A result of multiple insertions is shown below. Here,
the vector with 7 elements share structure with a vector with 10
elements:
The pink coloured nodes are shared between the vectors, whereas the
brown and blue are separate. Other vectors not visualized may also
share nodes with these vectors.
More info
Besides Understanding Clojure's Persistent Vectors, the ideas behind this data structure and its use cases are also explained pretty well in David Nolen's 2014 lecture Immutability, interactivity & JavaScript, from which the screenshot below was taken. Or if you really want to dive deeply into the technical details, see also Phil Bagwell's Ideal Hash Trees, which was the paper upon which Hickey's initial Clojure implementation was based.
What do you mean by "plain vector"? Just a flat array of items? That's great if you never update it, but if you ever change a 1M-element flat-vector you have to do a lot of copying; the tree exists to allow you to share most of the structure.
Short explanation: it uses the fact that the JVM optimizes so hard on read/write/copy array data structures. The key aspect IMO is that if your vector grows to a certain size index management becomes a  bottleneck . Here comes the very clever algorithm from persisted vector into play, on very large collections it outperforms the standard variant. So basically it is a functional data-structure which only performed so well because it is built up on small mutable highly optimizes JVM datastructures.
For further details see here (at the end)
http://topsy.com/vimeo.com/28760673
Judging by the title of the talk, it's talking about Scala vectors, which aren't even close to "the most locally packed data possible": see source at https://lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_9_1_final/src/library/scala/collection/immutable/Vector.scala.
Your definition only applies to Lisps (as far as I know).

Where do I find the memory requirements of a MATLAB function?

I have a 3D array of values (0 or 1), which is very large (approx 2300x2300x11). I want to fit a surface to these values using for example interp3, but when I try MATLAB runs out of memory. Thus, I've decided to reduce the size of my array enough for MATLAB to accomodate it in memory.
Now, the smaller I make the reduced array, the worse my results will be (the surface fitting is part of a measurement process with high precision requirements), so I want to reduce the array as little as possible.
Is there any way to determine on beforehand how much memory a certain array size will demand and how much memory is available, and then use this information to resize the array enough to avoid out of memory exceptions, but not more?
I don't know the answer to this, but I wonder if you can have your cake and eat it, too.
If your data set is too big, why not do a piecewise fit? Do it in chunks rather than omitting data points.
Or be smarter about how you omit data points. You want them in areas of high curvature - where your data is changing fastest. Leave out points in areas far away from the action, where nothing interesting is happening. You might have to do a fit, look at the surface, add and remove more points and try again.
It might an iterative process, but I'll bet you'll be able to get a nice fit with a little luck and effort.
You can look at the maximum array sizes that are supported on different platforms. In general, if you have a PxQxR sized 3D array of doubles, then the size of your array in bytes is P*Q*R*8. For your matrix, the size is ~ 444 MB. You can also try reducing it to a single, using single(A). single uses 4 bytes per element and you can reduce the size of your array by a factor 2.
I haven't really poked into the inner workings of interp3, but the exact memory requirements will depend on the interpolation option you choose. So, you can first try to convert it to single and see if it works. If not, try with 80% (90%) of the number of rows and columns. This way you have a good chunk of the original array, but the memory requirement is only 64% (81%) of the original.
If that doesn't help, duffymo's suggestion is what you should be looking into.