How to quickly load a small variable from huge .mat files? - matlab

I have a trained_model.mat file, whose size is around 23 GB.
This file has 6 variables,
4 of them are 1 X 1 doubles
1 is an 48962 X 1 double
1 is a TreeBagger object (this occupies maximum size).
I want to quickly load, only the 48962 X 1 variable whose name is Y_hat, but it is taking like eternity. I am running this code on a compute node on an cluster with 256GB of RAM, and no other user processes are running on this system.
I have already tried using load('trained_model.mat', 'Y_hat');, but this also takes very long time. Any suggestions will be greatly appreciated.

% Create a MAT-file object, m, connected to the MAT-file
% The object allows you to access and change variables directly in a MAT-file
% without having to load the variables into memory
m = matfile('trained_model.mat');
your_data_48962x1 = m.Y_hat;
% It should be faster than load
more info on mathworks

Related

Collect data using simout for an experiment with thousands of runs

I'm using this code (shown below) to run a Simulink model for thousands of runs. I want for each run to collect all the results.
Is there a way to collect the result each run and then organize them?
I did try simout, but I got a result for just one run.
Run(1).Settings={'....'};
Run(2).Setting={'....'};
....
dirout=sprintf('......,clock);
mkdir(dirout);
numofruns=length(Run); % or I can set it to 10000
for i=1:numofruns
counter=counter+1;
disp(['Run:'num2str(Counter) '/' num2str(numofruns)])
for j=1:size(Run(i).Settings,1)
set_param([modelname '/' Run(i).Settings{j,1} '/enabled/'
Run(i).Settings{j,2}],'value', num2str(Run(i).Settings{j,3}));
end
set_param(modelname,'StopTime',num2str(StopTime));
sim(modelname);
filename=sprintf('%s/simout_%05.0f.mat',dirout,i);
simout=simout';
save(filename,'simout');
end
The collected results should show the outcomes of every single run.
For example:
simout of run 1
simout of run 2
and so on
Your help is highly appreciated
A 1000 x 2 array of double-precision floating-point numbers only takes up 16000 bytes:
>> myMatrix = rand(1000, 2);
>> whos('myMatrix')
Name Size Bytes Class Attributes
myMatrix 1000x2 16000 double
so you should be able to fit tens of thousands of them in memory without trouble. If your simulation output will always be the same size, you can store them in a 3-dimensional array:
% preallocate the array to prevent memory reallocation, which is slow
resultArray = zeros(numofruns, 1000, 2);
for i = 1:numofruns
% run the simulation here, assume it returns 1000 x 2 matrix simout
resultArray(i,:,:) = simout;
end
If the number of rows may vary from one run to the next, you can use a cell array:
resultCellArray = cell(numofruns);
for i = 1:numofruns
% run simulation here
resultCellArray{i} = simout;
end
If you really are generating too much data to fit in memory at once, but you want to store it in one file and be able to access arbitrary subsets of it for analysis, you probably want to look at the techniques for working with large MAT-files. This will be much, much slower than handling data in memory.
Alternatively, you could try using the Simulation Data Inspector, although I don't know whether that can handle data too large for memory.

Matlab Horzcat - Out of memory

Any trick to avoid an out of memory error in matlab?
I am assuming that the reason it shows up is because matlab is very inefficient in using horzcat and actually needs to temporarily duplicate matrices.
I have a matrix A with size 108977555 x 25. I want to merge this with three vectors d, m and y with size 108977555 x 1 each.
My machine has 32GB ram, and the above matrice + vectors occupy 18GB.
Now I want to run the following command:
A = [A(:,1:3), d, m, y, A(:,5:end)];
But that yields the error:
Error using horzcat
Out of memory. Type HELP MEMORY for your options.
Any trick to do this merge?
Working with Large Data Sets. If you are working with large data sets, you need to be careful when increasing the size of an array to avoid getting errors caused by insufficient memory. If you expand the array beyond the available contiguous memory of its original location, MATLAB must make a copy of the array and set this copy to the new value. During this operation, there are two copies of the original array in memory.
Restart matlab, I often find it doesn't fully clean up its memory or it get's fragmented, leading to lower maximal array sizes.
Change your datatype (if you can). E.g. if you're only dealing with numbers 0 - 255, use uint8, the memory size will reduce by a factor 8 compared to an array of doubles
Start of with A already large enough (i.e. 108977555x27 instead of 108977555x25 and insert in place:
A(:, 4) = d;
clear d
A(:, 5) = m;
clear m
A(:, 6) = y;
Merge the data in one datatype to reduce total memory requirement, eg a date easily fits into one uint32.
Leave the data separated, think about why you want the data in one matrix in the first place and if that is really necessary.
Use C-code to do the data allocation yourself (only if you're really desperate)
Further reading: https://nl.mathworks.com/help/matlab/matlab_prog/memory-allocation.html
Even if you could make it using Gunther's suggestions, it will just occupy memory. Right now it takes more than half of available memory. So, what are you planning to do then? Even simple B = A+1 doesn't fit. The only thing you can do is stuff like sum, or operations on part of array.
So, you should consider going to tall arrays and other related big data concepts, which are exactly meant to work with such large datasets.
https://www.mathworks.com/help/matlab/tall-arrays.html
You can first try the efficient memory management strategies as mentioned on the official mathworks site : https://in.mathworks.com/help/matlab/matlab_prog/strategies-for-efficient-use-of-memory.html
Use Single (4 bytes) or some other smaller data type instead of Double (8 bytes) if your code can work with that.
If possible use block processing (like rows or columns) i.e. store blocks as separate mat files and load and access only those parts of the matrix which are required.
Use matfile command for loading large variables in parts. Perhaps something like this :
save('A.mat','A','-v7.3')
oldMat = matfile('A.mat');
clear A
newMat = matfile('Anew.mat','Writeable',true) %Empty matfile
for i=1:27
if (i<4), newMat.A(:,i) = oldMat.A(:,i); end
if (i==4), newMat.A(:,i) = d; end
if (i==5), newMat.A(:,i) = m; end
if (i==6), newMat.A(:,i) = y; end
if (i>6), newMat.A(:,i) = oldMat.A(:,i-2); end
end

Out of memory - default matlab database and code

I'm learning nn toolbox with matlab examples and i've got all time error
Out of memory. Type HELP MEMORY for your
options. Error in test2 (line 10) xTest = zeros(inputSize,numel(xTestImages));
Here is my simply code
% Get the number of pixels in each image
imageWidth = 28;
imageHeight = 28;
inputSize = imageWidth*imageHeight;
% Load the test images
[xTestImages, outputs] = digittest_dataset;
% Turn the test images into vectors and put them in a matrix
xTest = zeros(inputSize,numel(xTestImages));
for i = 1:numel(xTestImages)
xTest(:,i) = xTestImages{i}(:);
end
code is written according to
mathwork example (but im trying to do my own custom network). I reinstall matlab, make maximum java RAM storage, clean some disk space and delate rest of neural network. Still not working. Any ideas how to fix this problem?
As written above, the line:
xTest = zeros(inputSize,numel(xTestImages)); # xTestImages is 1x5000
would yield a matrix of size 28^2*5000= 3,920e6 elements. Every element has a double precision (8byte), hence the matrix would consume around 30mb...
You stated, that the command memory shows the following:
Maximum possible array: 29 MB (3.054e+07 bytes)
* Memory available for all arrays: 467 MB (4.893e+08 bytes)
** Memory used by MATLAB: 624 MB (6.547e+08 bytes)
Physical Memory (RAM): 3067 MB (3.216e+09 bytes)
So the first line shows the limitation for ONE single array.
So a few things to consider:
I guess clear all or quitting some other running applications does not improve the situation!?
Do you use a 64 or 32bit OS? And/or MATLAB 32/64 bit?
Did you try to change the Java Heap Settings? https://de.mathworks.com/help/matlab/matlab_external/java-heap-memory-preferences.html
I know this won't fix the problem, but maybe it will help you to keep on working in the meanwhile: You could create the matrix with single precision which should work for your testcase. Simply pass single as second option while creating the matrix.
Out of memory was created by Levenberg–Marquardt algorithm - it's create huge Jacobian matrix for calculations when data is big.

Save not successful for big cell array in matlab

I am working with Matlab 2009b in Windows 7 64 bit environment.
I am not able to save cell array of size 2.5GB using v7.3 switch but save is successful for struct of arrays of size 6 GB, containing only double values. Please advise me about the alternatives I can try.
What is working
I am able to save following variable in workspace successfully.
>> whos
Name Size Bytes Class Attributes
master_data 1x159 6296489360 struct
>> save('arrayOfStruct.mat','-v7.3')
Here master data is an array of 159 structure. Each of these 159 structures have five array of 1 million double values. Mat file of 594 MB saved in the filesystem.
What is not working
I am not able to save a cell array, which contains strings, doubles and array of doubles.
>> whos
Name Size Bytes Class Attributes
result_combined 57888x100 2544467328 cell
>> save('cellArray.mat','-v7.3');
When I execute the save command, a cellArray.mat file of size 530 MB is generated in the filesystem but the prompt never returns to matlab.(I have waited for more than 4 hours and run this after restarting the computer). If I terminate the matlab program while it is waiting for the prompt to return, the generated cellArray.mat is not usable as matlab shows the file cannot be loaded as it is corrupt.
Please suggest what I can try to save this variable result_combined.
NOTE
The save command works successfully in Matlab 2015a. Any suggestions as to how I can make it work in Matlab 2009b. I am using Matlab 2009b as default and do not want to migrate to 2015a as it might break existing setup.

Quickest way to search txt/bin/etc file for numeric data greater than specified value

I have a 37,000,000x1 double array saved in a matfile under a structure labelled r. I can point to this file using matfile(...) then just use the find(...) command to find all values above a threshold val
This finds all the values greater than/equal to 0.004 but given the size of my data, this takes some time.
I want to reduce the time and have considered using bin files (apparently they are better than txt files in terms of not losing precision?) etc, however I'm not knowledgable with the syntax/method
I've managed to save the data into the bin file, but what is the quickest way to search through this large file?
The only output data I want are the actually values greater than my specified value.
IS using a bin file the best? Or a matfile? Etc
I don't want to load the entire file into matlab. I want to conserve the matlab memory as other programs may need the space and I don't want memory errors again
As #OlegKomarov points out, a 37,000,000 element array of doubles is not very big. Your real problem may be that you don't have enough RAM and/or are using a 32-bit version of Matlab. The find function will require additional memory for the input and the out array of indices.
If you want to load and process your data in chunks, you can use the matfile function. Here's a small example:
fname = [tempname '.mat']; % Use temp directory file for example
matObj = matfile(fname,'Writable',true); % Create MAT-file
matObj.r = rand(37e4,1e2); % Write random date to r variable in file
szR = size(matObj,'r'); % Get dimensions of r variable in file
idx = [];
for i = 1:szR(2)
idx = [idx;find(matObj.r(:,i)>0.999)]; % Find indices of r greater than 0.999
end
delete(fname); % Delete example file
This will save you memory, but it definitely not faster than storing everything in memory and calling find once. File access is always slower (though it will help a bit if you have an SSD). The code above uses dynamic memory allocation for the idx variable, but the memory is only re-allocated a few times in large chunks, which can be quite fast in current versions of Matlab.