Incremental appending: How to avoid performance penalty of struct arrays

Incremental appending: How to avoid performance penalty of struct arrays - matlab

If you must incrementally append data to arrays, it seems that using individual vectors of basic data types is orders of magnitude faster than an array of structs (with one vector element per record). Even trying to collect the individual vectors into a struct seems to double the time. The tests are:
N=5e4;
fprintf('\nstruct array (array of structs):\n')
clear x y;
y=struct( 'a',[], 'b',[], 'c',[], 'd',[] );
tic
for iIns = 1 : N
x.a=rand; x.b=rand; x.c=rand; x.d=rand;
y(end+1)=x;
end % for iIns
toc
fprintf('\nSeparate arrays of scalars:\n')
clear a b c d;
a=[]; b=[]; c=[]; d=[];
tic
for iIns = 1 : N
a(end+1) = rand;
b(end+1) = rand;
c(end+1) = rand;
d(end+1) = rand;
end % for iIns
toc
fprintf('\nA struct with arrays of scalars for fields:\n')
clear a b c d x y
x.a=[]; x.b=[]; x.c=[]; x.d=[];
tic
for iIns = 1:N
x.a(end+1)=rand;
x.b(end+1)=rand;
x.c(end+1)=rand;
x.d(end+1)=rand;
end % for iIns
toc
The results:
struct array (array of structs):
Elapsed time is 24.127274 seconds.
Separate arrays of scalars:
Elapsed time is 0.048190 seconds.
A struct with arrays of scalars for fields:
Elapsed time is 0.084624 seconds.
Even though collecting individual vectors of basic data types into a struct (3rd scenario above) imposes such a penalty, it may be preferrable to simply using individual vectors (second scenario above) because the variables are more organized. Your variable name space isn't filled up with so many variables which are in fact conceptually grouped.
That's quite a significant penalty, however, to pay for such organization. I don't suppose there is way to avoid this?

There are two ways to avoid this performance penalty: (1) pre-allocate, and (2) rethink your stance on "organizing" variables. I suggest both. Oh, and if you can, don't use arrays of structs where each field only uses scalars - if your application suddenly has to handle a couple of orders of magnitude more data, the memory overhead will force you to rewrite everything.
Pre-allocation
You often know how many elements your array will end up having. Thus, initialize your arrays as s = struct('a',NaN(1:N),'b',NaN(1:N)); If you don't know ahead of time how many entries there will be, but you can estimate an upper limit, initialize with the upper limit, and either remove the elements, or use functions (e.g. nanmean) that do not care if the array has a few extra NaNs in the end. If you truly know nothing about the final size (except that N will be large enough to matter), pre-allocate with a nice number (e.g. N=1337), and extend the array in chunks. MathWorks have sped up dynamic growing of numeric arrays in a recent release, but as you demonstrate in your answer, the optimization has not been applied to structs yet. Don't count MathWorks' optimization team to fix your code.
Nice variables
Why worry about your variable space? As long as you use explicitVariableNames, your code remains readable and you will have an easy time picking out the right variable. But ok, let's say you want to clean up: The first way to keeping the number of active variables low is to use clear or keep at strategic points in your code to make sure you only keep around what's needed. The second (assuming you want to optimize for performance), is to put contextually linked vectors into the same array: objectDimensions = [lengthOfObject, widthOfObject, heightOfObject]. This keeps everything as numeric arrays (which are fastest), and allows easy vectorization such as objectVolume = prod(objectDimensions,2);.
/aside: I should disclose that I used to use structures frequently for assembling results (so that I could return a lot of information a single variable and have the field names be part of the documentation). I have since switched to use object-oriented-programming (usually handle-objects), which no only collect related variables, but also the associated functionality, and which facilitate code re-use. I do take a performance hit, but the time it saves me coding makes more than up for it. Note that I do pre-allocate if at all possible (and if it's not just growing an array three times).
Example
Assume you have a function getDimensions that reads dimensions (length, height, width) of objects. However, sometimes, the object is 2D, sometimes it is 3D. Thus, you want to fill the following variables: twoD.length, twoD.width, threeD.length, threeD.width, threeD.height, ideally as arrays of structs, so that each element of a struct corresponds to an object. You do not know ahead of time how many objects there are, all you can do is poll the function thereAreMoreObjects, which returns true or false, until there are no more objects.
Here's how you can do this with reasonable efficiency and growing arrays by chunks:
%// preassign the temporary variable, and some others
chunkSize = 1000;
numObjects = 0;
idAndDimensions = zeros(chunkSize,4);
while thereAreMoreObjects()
objectId = getCurrentObjectId();
%// hi==-1 if it's flat
[len,wid,hi] = getObjectDimensions(objectId);
%// allocate more, if needed
numObjects = numObjects + 1;
if numObjects > size(idAndDimensions,1)
%// grow array
idAndDimensions(end+chunkSize,1) = 0;
end
idAndDimensions(numObjects,:) = [objectId, len, wid, hi];
end
%// throw away excess
idAndDimensions = idAndDimensions(1:numObjects,:);
%// split into 2D and 3D objects
isTwoD = numObjects(:,end) == -1;
%// assign twoD struct
twoD = struct('id',num2cell(idAndDimensions(isTwoD,1),...
'length',num2cell(idAndDimensions(isTwoD,2),...
'width',num2cell(idAndDimensions(isTwoD,3));
%// assign threeD struct
%// clean up - we need only the two structs
%// I use keep from the File Exchange instead of clearvars
clearvars -except twoD threeD

Related

How can I prevent memory churn in MATLAB?

I'm putting together a hypothesis tree type algorithm in MATLAB and it is being slowed terribly by memory issues. The profiler shows all time being spent just writing into arrays.
The algorithm keeps a list of hypotheses with information about them in an array of structs. The issue is related to 3D arrays (not big) within the hypothesis:
H(x).someInfo(a,b,c)
Each iteration, some hypotheses are discarded:
H = H(keepIndices);
And the ones that remain are expanded and updated:
Hin = H;
H(N*length(H)) = H(1); % Pre-alloc?
count = 0;
for x = 1:length(Hin)
for y = 1:N
count = count + 1;
H(count) = Hin(x);
... % Computations
H(count).someInfo(:,:,a) = M; % Much time spent here
end
end
The profiler indicates huge amounts of time spent just doing the write (note comment). "someInfo" is preallocated so is not itself growing dynamically, but it is getting copied around.
Can anyone suggest a way to achieve this type of functionality without getting crossways with inefficiencies in the way MATLAB deals with memory? Not blaming MATLAB, but its flexibility makes this harder than it would be in C++.

If the access pattern to someInfo is always the same, you could turn it into a cell array of 2D matrices. You'll find that
H(count).someInfo{a} = M;
is faster than
H(count).someInfo(:,:,a) = M;
because the array data is not copied over, only the reference to the data is.
...and if that is the case, you might want to do
H{count,a} = M;
Note that the fewer levels of indexing (you have 3!), the faster it is.

Matlab vectorization of multiple embedded for loops

Suppose you have 5 vectors: v_1, v_2, v_3, v_4 and v_5. These vectors each contain a range of values from a minimum to a maximum. So for example:
v_1 = minimum_value:step:maximum_value;
Each of these vectors uses the same step size but has a different minimum and maximum value. Thus they are each of a different length.
A function F(v_1, v_2, v_3, v_4, v_5) is dependant on these vectors and can use any combination of the elements within them. (Apologies for the poor explanation). I am trying to find the maximum value of F and record the values which resulted in it. My current approach has been to use multiple embedded for loops as shown to work out the function for every combination of the vectors elements:
% Set the temp value to a small value
temp = 0;
% For every combination of the five vectors use the equation. If the result
% is greater than the one calculated previously, store it along with the values
% (postitions) of elements within the vectors
for a=1:length(v_1)
for b=1:length(v_2)
for c=1:length(v_3)
for d=1:length(v_4)
for e=1:length(v_5)
% The function is a combination of trigonometrics, summations,
% multiplications etc..
Result = F(v_1(a), v_2(b), v_3(c), v_4(d), v_5(e))
% If the value of Result is greater that the previous value,
% store it and record the values of 'a','b','c','d' and 'e'
if Result > temp;
temp = Result;
f = a;
g = b;
h = c;
i = d;
j = e;
end
end
end
end
end
end
This gets incredibly slow, for small step sizes. If there are around 100 elements in each vector the number of combinations is around 100*100*100*100*100. This is a problem as I need small step values to get a suitably converged answer.
I was wondering if it was possible to speed this up using Vectorization, or any other method. I was also looking at generating the combinations prior to the calculation but this seemed even slower than my current method. I haven't used Matlab for a long time but just looking at the number of embedded for loops makes me think that this can definitely be sped up. Thank you for the suggestions.

No matter how you generate your parameter combination, you will end up calling your function F 100^5 times. The easiest solution would be to use parfor instead in order to exploit multi-core calculation. If you do that, you should store the calculation results and find the maximum after the loop, because your current approach would not be thread-safe.
Having said that and not knowing anything about your actual problem, I would advise you to implement a more structured approach, like first finding a coarse solution with a bigger step size and narrowing it down successivley by reducing the min/max values of your parameter intervals. What you have currently is the absolute brute-force method which will never be very effective.

Preallocation of cell array in matlab

This is more a question to understand a behavior rather than a specific problem.
Mathworks states that numerical are stored continuous which makes preallocation important. This is not the case for cell arrays.
Are they something similar than vector or array of pointers in C++?
This would mean that prealocation is not so important since a pointer is half the size of a double (according to whos - but there surely is overhead somewhere to store the datatype of the mxArray).
Running this code:
clear all
n = 1e6;
tic
A = [];
for i=1:n
A(end + 1) = 1;
end
fprintf('Numerical without preallocation %f s\n',toc)
clear A
tic
A = zeros(1,n);
for i=1:n
A(i) = 1;
end
fprintf('Numerical with preallocation %f s\n',toc)
clear A
tic
A = cell(0);
for i=1:n
A{end + 1} = 1;
end
fprintf('Cell without preallocation %f s\n',toc)
tic
A = cell(1,n);
for i=1:n
A{i} = 1;
end
fprintf('Cell with preallocation %f s\n',toc)
returns:
Numerical without preallocation 0.429240 s
Numerical with preallocation 0.025236 s
Cell without preallocation 4.960297 s
Cell with preallocation 0.554257 s
There is no surprise for the numerical values. But the did surprise me since only the container of the pointers and not the data itself would need reallocation. Which should (since the pointer is smaller than a double) lead to difference of <.2s. Where does this overhead come from?
A related question would be, if I would like to make a data container for heterogeneous data in Matlab (preallocation is not possible since the final size is not known in the beginning). I think handle classes are not good since the also have huge overhead.
already looking forward to learn something
magu_
Edit:
I tried out the linked list proposed by Eitan T but I think the overhead from matlab is still rather big. I tried something with an double array as data (rand(200000,1)).
I made a little plot to illustrate:
code for the graph: (I used the dlnode class from the matlab hompage as stated in the answering post)
D = rand(200000,1);
s = linspace(10,20000,50);
nC = zeros(50,1);
nL = zeros(50,1);
for i = 1:50
a = cell(0);
tic
for ii = 1:s(i)
a{end + 1} = D;
end
nC(i) = toc;
a = list([]);
tic
for ii = 1:s(i)
a.insertAfter(list(D));
end
nL(i) = toc;
end
figure
plot(s,nC,'r',s,nL,'g')
xlabel('#iter')
ylabel('time (s)')
legend({'cell' 'list'})
Don't get me wrong I love the idea of linked list, since there are rather flexible, but I think the overhead might be to big.

Are cell arrays something similar to a vector or an array of pointers in C++?
Cell arrays allow storing data of different types and sizes indeed, but each cell also adds a constant overhead of 112 bytes (see this other answer of mine). This is far more than an 8-byte double, and this is non-negligible, especially when dealing with large cell arrays as in your example.
It is reasonable to assume that a cell array is implemented as a continuous array of pointers, each pointing to the actual content of the cell.
This means that you can modify the content of each cell individually without actually resizing the cell array container itself. However, this also means that adding new cells to the cell array requires dynamic storage allocation and this is why preallocating memory for a cell array improves performance.
A related question would be, if I would like to make a data container for heterogeneous data in Matlab (preallocation is not possible since the final size is not known in the beginning)
Not knowing the final size may indeed be a problem, but you could always preallocate a cell array with the maximum supported size necessary (if there is one), and remove the empty cells in the end. I also suggest that you look into implementing linked lists in MATLAB.

Recursive loop optimization

Is there a way to rewrite my code to make it faster?
for i = 2:length(ECG)
u(i) = max([a*abs(ECG(i)) b*u(i-1)]);
end;
My problem is the length of ECG.

You should pre-allocate u like this
>> u = zeros(size(ECG));
or possibly like this
>> u = NaN(size(ECG));
or maybe even like this
>> u = -Inf(size(ECG));
depending on what behaviour you want.
When you pre-allocate a vector, MATLAB knows how big the vector is going to be and reserves an appropriately sized block of memory.
If you don't pre-allocate, then MATLAB has no way of knowing how large the final vector is going to be. Initially it will allocate a short block of memory. If you run out of space in that block, then it has to find a bigger block of memory somewhere, and copy all the old values into the new memory block. This happens every time you run out of space in the allocated block (which may not be every time you grow the array, because the MATLAB runtime is probably smart enough to ask for a bit more memory than it needs, but it is still more than necessary). All this unnecessary reallocating and copying is what takes a long time.

There are several several ways to optimize this for loop, but, surprisingly memory pre-allocation is not the part that saves the most time. By far. You're using max to find the largest element of a 1-by-2 vector. On each iteration you build this vector. However, all you're doing is comparing two scalars. Using the two argument form of max and passing it two scalar is MUCH faster: 75+ times faster on my machine for large ECG vectors!
% Set the parameters and create a vector with million elements
a = 2;
b = 3;
n = 1e6;
ECG = randn(1,n);
ECG2 = a*abs(ECG); % This can be done outside the loop if you have the memory
u(1,n) = 0; % Fast zero allocation
for i = 2:length(ECG)
u(i) = max(ECG2(i),b*u(i-1)); % Compare two scalars
end
For the single input form of max (not including creation of random ECG data):
Elapsed time is 1.314308 seconds.
For my code above:
Elapsed time is 0.017174 seconds.
FYI, the code above assumes u(1) = 0. If that's not true, then u(1) should be set to it's value after preallocation.

Caching Matlab function results to file

I'm writing a simulation in Matlab.
I will eventually run this simulation hundreds of times.
In each simulation run, there are millions of simulation cycles.
In each of these cycles, I calculate a very complex function, which takes ~0.5 sec to finish.
The function input is a long bit array (>1000 bits) - which is an array of 0 and 1.
I hold the bit arrays in a matrix of 0 and 1, and for each one of them I only run the function once - as I save the result in a different array (res) and check if the bit array is in the matrix before running the functions:
for i=1:1000000000
%pick a bit array somehow
[~,indx] = ismember(bit_array,bit_matrix,'rows');
if indx == 0
indx = length(results) + 1;
bit_matrix(indx,:) = bit_array;
res(indx) = complex_function(bit_array);
end
result = res(indx)
%do something with result
end
I have two quesitons, really:
Is there a more efficient way to find the index of a row in a matrix then 'ismember'?
Since I run the simulation many times, and there is a big overlap in the bit-arrays I'm getting, I want to cache the matrix between runs so that I don't recalculate the function over the same bit-arrays over and over again. How do I do that?

The answer to both questions is to use a map. There are a few steps to do this.
First you will need a function to turn your bit_array into either a number or a string. For example, turn [0 1 1 0 1 0] into '011010'. (Matlab only supports scalar or string keys, which is why this step is required.)
Defined a map object
cachedRunMap = containers.Map; %See edit below for more on this
To check if a particular case has been run, use iskey.
cachedRunMap.isKey('011010');
To add the results of a run use the appending syntax
cachedRunMap('011010') = [0 1 1 0 1]; %Or whatever your result is.
To retrieve cached results, use the getting syntax
tmpResult = cachedRunMap.values({'011010'});
This should efficiently store and retrieve values until you run out of system memory.
Putting this together, now your code would look like this:
%Hacky magic function to convert an array into a string of '0' and '1'
strFromBits = #(x) char((x(:)'~=0)+48); %'
%Initialize the map
cachedRunMap = containers.Map;
%Loop, computing and storing results as needed
for i=1:1000000000
%pick a bit array somehow
strKey = strFromBits(bit_array);
if cachedRunMap.isKey(strKey)
result = cachedRunMap(strKey);
else
result = complex_function(bit_array);
cachedRunMap(strKey) = reult;
end
%do something with result
end
If you want a key which is not a string, that needs to be declared at step 2. Some examples are:
cachedRunMap = containers.Map('KeyType', 'char', 'ValueType', 'any');
cachedRunMap = containers.Map('KeyType', 'double', 'ValueType', 'any');
cachedRunMap = containers.Map('KeyType', 'uint64', 'ValueType', 'any');
cachedRunMap = containers.Map('KeyType', 'uint64', 'ValueType', 'double');
Setting a KeyType of 'char' sets the map to use strings as keys. All other types must be scalars.
Regarding issues as you scale this up (per your recent comments)
Saving data between sessions: There should be no issues saving this map to a *.mat file, up to the limits of your systems memory
Purging old data: I am not aware of a straightforward way to add LRU features to this map. If you can find a Java implementation you can use it within Matlab pretty easily. Otherwise it would take some thought to determine the most efficient method of keeping track of the last time a key was used.
Sharing data between concurrent sessions: As you indicated, this probably requires a database to perform efficiently. The DB table would be two columns (3 if you want to implement LRU features), the key, value, (and last used time if desired). If your "result" is not a type which easily fits into SQL (e.g. a non-uniform size array, or complex structure) then you will need to put additional thought into how to store it. You will also need a method to access the database (e.g. the database toolbox, or various tools on the Mathworks file exchange). Finally you will need to actually setup a database on a server (e.g. MySql if you are cheap, like me, or whatever you have the most experience with, or can find the most help with.) This is not actually that hard, but it takes a bit of time and effort the first time through.
Another approach to consider (much less efficient, but not requiring a database) would be to break up the data store into a large (e.g. 1000's or millions) number of maps. Save each into a separate *.mat file, with a filename based on the keys contained in that map (e.g. the first N characters of your string key), and then load/save these files between sessions as needed. This will be pretty slow ... depending on your usage it may be faster to recalculate from the source function each time ... but it's the best way I can think of without setting up the DB (clearly a better answer).

For a large list, a hand-coded binary search can beat ismember, if maintaining it in sorted order isn't too expensive. If that's really your bottleneck. Use the profiler to see how much the ismember is really costing you. If there aren't too many distinct values, you could also store them in a containers.Map by packing the bit_matrix in to a char array and using it as the key.
If it's small enough to fit in memory, you could store it in a MAT file using save and load. They can store any basic Matlab datatype. Have the simulation save the accumulated res and bit_matrix at the end of its run, and re-load them the next time it's called.

I think that you should use containers.Map() for the purpose of speedup.
The general idea is to hold a map that contains all hash values. If your bit arrays have uniform distribution under the hash function, most of the time you won't need the call to ismember.
Since the key type cannot be an array in Matlab, you can calculate some hash function on your array of bits.
For example:
function s = GetHash(bitArray)
s = mod( sum(bitArray), intmax('uint32'));
end
This is a lousy hash function, but enough to understand the principle.
Then the code would look like:
map = containers.Map('KeyType','uint32','ValueType','any');
for i=1:1000000000
%pick a bit array somehow
s = GetHash(bit_array);
if isKey %Do the slow check.
[~,indx] = ismember(bit_array,bit_matrix,'rows');
else
map(s) = 1;
continue;
end
if indx == 0
indx = length(results) + 1;
bit_matrix(indx,:) = bit_array;
res(indx) = complex_function(bit_array);
end
result = res(indx)
%do something with result
end

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Incremental appending: How to avoid performance penalty of struct arrays - matlab

Related

How can I prevent memory churn in MATLAB?

Matlab vectorization of multiple embedded for loops

Preallocation of cell array in matlab

Recursive loop optimization

Caching Matlab function results to file

Categories

Resources