What is the best way to preallocate a table in Matlab? - matlab

Usually I preallocate using cell(), zeros() or ones() depending on the type of data, but what is the best way to preallocate a table as it can hold various data structures?
I am talking about the table() functionality added in Matlab 2013b.
Obviously I can reserve memory using code like this:
T = table(cell(x,y))
but when my table is supposed to hold various datatypes I run into problems. Just imagine I want to fill in a column of integers now, or like in my case fill each row with an observation containing a string, an integer and a floating point number. T

How should Matlab know, how much memory to allocate, when you don't want to tell it what data is stored in the table? I don't think there is a good answer to your question besides "don't do it". If you know what is stored in each column, create the variables and add the rows as you go.
Or create the Data in preallocated Matrixes/Cells and create the table from them at the end.

Related

MATLAB: Growing hash table

I use a hash table in my code and when the code is running I add keys and values into the hash table. At first, I thought using a hash table make my code faster but I was wrong and using hash table has made it slower. As I searched about it, I realized that increasing size of the hash table and writing it takes time because when it gets larger, MATLAB seeks for a bigger space and seeking for a bigger space wastes time.
Is there any preallocating method for hash tables in MATLAB?
Thanks.
I assume you mean that you're using the built-in containers.Map object as your hashtable. While there is no direct means to pre-allocate such an object, I suggest that you use either a simple two-column cell-array, or a java.util.Hashtable object, both of which are much faster in general than containers.Map.
Reference:
https://undocumentedmatlab.com/blog/using-java-collections-in-matlab

Multidimensional array or cell array in Matlab?

I want to create a multidimensional array A in Matlab of dimension NxMxG with N,M,G very large (e.g. 10^6).
Then I need to access Ain a loop as
for g=1:G
Atemp=A(:,:,g);
%etc etc
end
What is more convenient in terms of speed and memory between storing the values of A in a multidimensional array or in a cell array?
If you always loop on slices in the same way, and process them one at a time, as your bit of code seem to suggest, then the performance should be roughly equivalent.
If you really intend to store 1e6x1e6x1e6 double's, Matlab is definitely probably not your tool. However, if slices are sparse, then it's probably a bit more efficient to store them as a cell array, so Matlab does not have to search the full 3D space when "cutting" the slice, and Atemp=A{g}; simply copies a sparse matrix.
If you are working on full (nonsparse) slices then probably you should load/save your slice to disk and use instead a function/support class which loads from file: Atemp=A(g);. Mind that text loading takes up much more time than loading a binary file: so choose your file format carefully!
If you use numbers, a multidimensional array is the right thing to use. A cell array also allows other types, so is less optimised for numbers only. Because you are using very large arrays, maybe a sparse matrix may be appropriate for you.
First, note that neither pick will let you handle 10^18 values. You don't have exabytes of storage, let alone memory.
If you will ONLY ever use it as Atemp = A(:,:,g); with N and M always the same size for all g, having it multi-dimensional or cell shouldn't change anything meaningful as far as performance goes. N-D will be probably a bit faster, but nothing significant.
Obviously, if you ever want to have computation with different sizes of N, M depending on g, you need to pick cell array. And if you want to have computation with say Atemp = squeeze(A(:,g,:)); N-D array is clear choice here.
So, choice most likely depends if you prefer doing A(:,:,g) or A{g};, which depends on your meaning of data. Say if you have weather data and currently only care about what happens at specific height (not what happens between the layers), A(:,:,g) is clearly more sensible. It is possible you will require inter-layer calculations at some point. But if you have instead g meaning different measurement sites gathering data, A{g} should be used to pick the site. You will likely have some sites larger or smaller eventually.

Sparse table in MATLAB, is it possible?

\ am dealing with a matrix in MATLAB which is sparse and has many rows and columns. In this case, the row and columns of the matrix are the ids for particular items. Let's assume them as id1 and id2.
It would be nice if the ids for rows and columns could be embedded so I can have access to them easily to them without the need for creating extra variables that keep the two ids.
The answer would be probably to use a table data type. Tables are very ideal answer for my need however I was wondering if I could create a table data type for a sparse matrix?
A [m*n] sparse matrix %% m & n are huge
id1 [1*m] , id2 [1*n] %% two vectors containing numeric ids for rows and column
Could we obtain?
T [m*n] sparse table matrix
Thanks for sharing your view with me.
I will address the question and the comments in order to clear some confusion.
The short answer
There is no sparse table class in Matlab. Cannot do. Use sparse() matrices.
The long answer
There is a reason why sparse tables make little sense:
Philosophically speaking, the advantage of having nice row and column labels, is completely lost if you are working with a big panel of data and/or if the data is sparse.
Scrolling through 246829 rows and 33336 columns? Can only be useful at very isolated times if you are debugging your code and a specific outlier is causing you results to go off. Also, all you might see is just a sea of zeros.
Technically a table can have more columns for the same variable, i.e. table(rand(10,2), rand(10,1)) is a valid table. How would you consider define sparsity on such table?
Fine, suppose you are working with a matrix-like table, i.e. one element per table cell and same numeric class. Still, none of the algebraic operators are defined on a table(). So you need to extract the content first, in order to be able to perform any operation that spans more than a single column of data. Just to be clear, once the data is extracted, then you have e.g. your double (full) matrix or in an ideal case a double sparse matrix.
Now, a few misconceptions to clear:
Less variables implies clearer/cleaner code. Not true. You are probably thinking about the extreme case (in bad practices) of how do I make a series of variables a1, a2, a3, etc..
There is a sweet spot between verbosity and number of variables, amount of comments, and code clarity/maintainability. Only with time and experience you find the right balance.
Control over data cannot go without visual inspection. This approach does NOT scale with big data and the sooner you abandon it, the faster your code will become more reliable. You need to verify your results systematically, rather than relying on visual inspection. Failure to (visually) spot a problem in the data, grows exponentially with its dimension, faster than with systematic tests.
Some background info on my work:
I work with high-frequency prices, that's terabytes of data. I also extended the table() class with additional methods and fixes to help me with my work (see https://github.com/okomarov/tableutils), but I do not see how sparsity is a useful feature to add to table().

Efficient way to store single matrices generated in a loop in Matlab?

I would like to know whether there is a way to reduce the amount of memory used by the following piece of code in Matlab:
n=3;
T=100;
r=T*2;
b=80;
BS=1000
bsuppostmp_=cell(1,BS);
bslowpostmp_=cell(1,BS);
bsuppnegtmp_=cell(1,BS);
bslownegtmp_=cell(1,BS);
for w=1:BS
bsuppostmp_{w}= randi([0,1],n*T,2^(n-1),r,b);
bslowpostmp_{w}=randi([0,3],n*T,2^(n-1),r,b);
bsuppnegtmp_{w}=randi([0,4],n*T,2^(n-1),r,b);
bslownegtmp_{w}=randi([0,2],n*T,2^(n-1),r,b);
end
I have decided to use cells of matrices because after this loop I need to call separately each single matrix in another loop.
If I run this code I get the message error "Your system has run out of application memory".
Do you know a more efficient (in terms of memory) way to store each single matrix?
Let's refer the page about Strategies for Efficient Use of Memory:
Because simple numeric arrays (comprising one mxArray) have the least overhead, you should use them wherever possible. When data is too complex to store in a simple array (or matrix), you can use other data structures.
Cell arrays are comprised of separate mxArrays for each element. As a result, cell arrays with many small elements have a large overhead.
I doubt that the overhead for cell array is really large ...
Let me give a possible explanation. What if Matlab cannot use the swap file in case of storing the 4D arrays into a cell array? When storing large numeric arrays, there is no out-of-memory error because Matlab uses the swap file for caching each variable when the used memory becomes too big. Whereas if each 4D array is stored in a super cell array, Matlab sees it as a single variable and cannot fragment it part in the RAM and part in the swap file. Ok I don't work at Mathworks so I don't know if I'm right or not, it's just an idea about this topic so I would be glad to know what is the real reason.
So my advice is the same as other comments: try to free matrices as soon as you've done with them. There is not so many possibilities to store many dense arrays: one big array (NOT recommended here, will reach out-of-memory sooner because Matlab makes it contiguous), cell array or struct array (and if I correctly understand the documentation, the overhead can be equivalent). In all cases, the data amount over all 4D arrays is really large, so the best thing to do is to care about keeping the memory constantly as low as possible by discarding some data once they are used and keep in memory only the results of computation (in case they take lower memory usage ...).

Efficient Access of elements in Matrix in Matlab

I have an m x n matrix of integers and where n is a fairly big number m and n ~1000. I want to iterate through all of these and perform a some operations, like accessing a particular cell and assigning a value of a particular cells.
However, at least in my implementation, this is rather inefficient as I have two for loops with Matrix(a,b) = Matrix(a,b+1) or something along these lines. Is there any other way to do this seeing as my current implementation takes a long time to traverse through about 100,000 cells and perform some operations.
Thank you
In matlab, it's almost always possible to avoid loops.
If you want to do Matrix(a,b)=Matrix(a,b+1), you should just do Matrix2=Matrix(:,2:end);
If you are more precise about what you do inside the loop, I can help you more.
Matlab uses column major ordering of matrixes in memory (unlike C). Are you sure you are iterating the indexes in the correct order? If not, try switching them and see if performance improves..
If you can't get rid of the for loops, one possibility would be to rewrite the expensive operations in C and create a MEX file as described here.