Save not successful for big cell array in matlab - matlab

I am working with Matlab 2009b in Windows 7 64 bit environment.
I am not able to save cell array of size 2.5GB using v7.3 switch but save is successful for struct of arrays of size 6 GB, containing only double values. Please advise me about the alternatives I can try.
What is working
I am able to save following variable in workspace successfully.
>> whos
Name Size Bytes Class Attributes
master_data 1x159 6296489360 struct
>> save('arrayOfStruct.mat','-v7.3')
Here master data is an array of 159 structure. Each of these 159 structures have five array of 1 million double values. Mat file of 594 MB saved in the filesystem.
What is not working
I am not able to save a cell array, which contains strings, doubles and array of doubles.
>> whos
Name Size Bytes Class Attributes
result_combined 57888x100 2544467328 cell
>> save('cellArray.mat','-v7.3');
When I execute the save command, a cellArray.mat file of size 530 MB is generated in the filesystem but the prompt never returns to matlab.(I have waited for more than 4 hours and run this after restarting the computer). If I terminate the matlab program while it is waiting for the prompt to return, the generated cellArray.mat is not usable as matlab shows the file cannot be loaded as it is corrupt.
Please suggest what I can try to save this variable result_combined.
NOTE
The save command works successfully in Matlab 2015a. Any suggestions as to how I can make it work in Matlab 2009b. I am using Matlab 2009b as default and do not want to migrate to 2015a as it might break existing setup.

Related

"MATLAB: corrupted double-linked list"

I recently started getting the error
MATLAB: corrupted double-linked list
about 90% of the time when running a moderately complex matlab model on a supercomputing cluster.
The model runs fine on my laptop [about 15 hours for a run, cluster is used for parameter sweeps], and has done for nearly 2 years.
The only difference in recent runs is that the output is more verbose and creates a large array that wasn't there previously (1.5 Gb).
The general pattern for this array is that it is a 3D array, built from saving 2D slices of the model each timestep. The array is initialised outside of the timestepping loop, and slices are overwritten as the model progresses
%** init
big_array = zeros(a,b,c)
%** Loop
for i=1:c
%%%% DO MODEL %%%%
%** Save to array
big_array(:,:,i) = modelSnapshot';
end
I have checked that the indexing of this array is correct (ie. big_array(:,:,i) = modelSnapshot' has the correct dimensions / size)
Does anyone have any experience with this error and can point to solutions?
The only relevant results I can see on google are for matlabs' mex-file stuff, which is not active in my model
(crashes are on matlab 2016a, laptop runs 2014a)

How to quickly load a small variable from huge .mat files?

I have a trained_model.mat file, whose size is around 23 GB.
This file has 6 variables,
4 of them are 1 X 1 doubles
1 is an 48962 X 1 double
1 is a TreeBagger object (this occupies maximum size).
I want to quickly load, only the 48962 X 1 variable whose name is Y_hat, but it is taking like eternity. I am running this code on a compute node on an cluster with 256GB of RAM, and no other user processes are running on this system.
I have already tried using load('trained_model.mat', 'Y_hat');, but this also takes very long time. Any suggestions will be greatly appreciated.
% Create a MAT-file object, m, connected to the MAT-file
% The object allows you to access and change variables directly in a MAT-file
% without having to load the variables into memory
m = matfile('trained_model.mat');
your_data_48962x1 = m.Y_hat;
% It should be faster than load
more info on mathworks

Quickest way to search txt/bin/etc file for numeric data greater than specified value

I have a 37,000,000x1 double array saved in a matfile under a structure labelled r. I can point to this file using matfile(...) then just use the find(...) command to find all values above a threshold val
This finds all the values greater than/equal to 0.004 but given the size of my data, this takes some time.
I want to reduce the time and have considered using bin files (apparently they are better than txt files in terms of not losing precision?) etc, however I'm not knowledgable with the syntax/method
I've managed to save the data into the bin file, but what is the quickest way to search through this large file?
The only output data I want are the actually values greater than my specified value.
IS using a bin file the best? Or a matfile? Etc
I don't want to load the entire file into matlab. I want to conserve the matlab memory as other programs may need the space and I don't want memory errors again
As #OlegKomarov points out, a 37,000,000 element array of doubles is not very big. Your real problem may be that you don't have enough RAM and/or are using a 32-bit version of Matlab. The find function will require additional memory for the input and the out array of indices.
If you want to load and process your data in chunks, you can use the matfile function. Here's a small example:
fname = [tempname '.mat']; % Use temp directory file for example
matObj = matfile(fname,'Writable',true); % Create MAT-file
matObj.r = rand(37e4,1e2); % Write random date to r variable in file
szR = size(matObj,'r'); % Get dimensions of r variable in file
idx = [];
for i = 1:szR(2)
idx = [idx;find(matObj.r(:,i)>0.999)]; % Find indices of r greater than 0.999
end
delete(fname); % Delete example file
This will save you memory, but it definitely not faster than storing everything in memory and calling find once. File access is always slower (though it will help a bit if you have an SSD). The code above uses dynamic memory allocation for the idx variable, but the memory is only re-allocated a few times in large chunks, which can be quite fast in current versions of Matlab.

save a large cell matrix (string variables) in Matlab is very slow and size is massive

I have a big cell matrix (string variables) with 40,000,000 lines. I first check the size using whos('file'), and it tells me that the size of the matrix in the workspace is 4.5GB. Then, I use 'save('file',-v7.3) in order to export it to .mat file. It takes so long time and after 10 mins is still saving, so I check the file in the target directory, the file size is already 12GB and is still increasing. Can anybody tell me what happen? Is there any other way to save this matrix? It doesn't need to be a .mat file, it can be .txt or something else.
A small part of the matrix.
'00086810'
'00192610'
'00213T10'
'00339010'
'00350L10'
'00350P10'
'00428010'
'00431F10'
'00433710'
'00723110'
'00743710'
'00818210'
'00818810'
'01031710'
'01204610'
'01747610'
'01747F10'
'01852Q10'
'01853510'
'01887110'
'01888510'
'01890A10'
'01920510'
'02316010'
'02343R10'
'02361310'
'02391210'
'02407310'
'02407640'
'02408H10'
'02434310'
'02520W10'
'02581610'
Lets test
test='helloooooo'
whos('test')
Name Size Bytes Class Attributes
test 1x10 20 char
save('A','test')
size A file 184 bytes
Lets test bigger data.
symbols = ['a':'z' 'A':'Z' '0':'9'];
MAX_ST_LENGTH = 500;
stLength = randi(MAX_ST_LENGTH);
for ii=1:100
nums = randi(numel(symbols),[1 stLength]);
testcell{ii} = symbols (nums);
end
save('test','testcell')
whos('testcell')
Name Size Bytes Class Attributes
testcell 1x100 52200 cell
Size file 15.7Kb
It compresses data. BUT I realised that it depends in the data. however usually I got x3 compression
Can you show us you code, maybe you are saving things wrongly

Matlab variable taking forever to save

I have a MATLAB variable that is a 3x6 cell array. One of the columns of the cell array holds at most 150-200 small RGB images, like 16x20 pixel size (again, at most). The rest of the columns are:
an equal number of labels that are strings of a 3 or 4 characters,
an image mask, which is about 350x200
3 integers
For some reason saving this object is taking a very long time, or at least for the size of the object. It has already been 10 minutes(which isn't too bad, but I plan on expanding the size of the object to hold several thousand of those small images) and MATLAB doesn't seem to be making any progress. In fact, when I open the containing directory of the variable, its size is cycling between 0 bytes to about 120kB. (i.e. it will increase to 120 in steps of 30 or 40 kB, then restart).
Is this normal behavior? Do MATLAB variables always take so long to save? What's going on here?
Mistake: I'm saving AllData, not my SVM variable. AllData has the same data as the SVM keeper, less the actual SVM itself and one integer.
What particular points of the code would be helpful to show for solving this? The code itself is a few hundred lines and broken up in several functions. What would be important to consider to troubleshoot this? When the variable is created? when it's saved? The way I create the smaller images?
Hate to be the noob who takes a picture of their desktop. but the machine I'm working has problems taking screenshots. Anyway, here it is
Alldata/curdata are just subsets of the 3x7 array... actually it's a 3x8, but the last is just an int.
Interesting side point: I interrupted the saving process and the variable seemed to save just fine. I trained a new svm on the saved data and it works just fine. I'd like to not do that in the future though.
Using whos:
Name Size Bytes Class Attributes
AllData 3x6 473300 cell
Image 240x352x3 253440 uint8
RESTOREDEFAULTPATH_EXECUTED 1x1 1 logical
SVMKeeper 3x8 2355638 cell
ans 3x6 892410 cell
curData 3x6 473300 cell
dirpath 1x67 134 char
im 240x352x3 1013760 single
s 1x1 892586 struct
Updates:
1.Does this always happen, or did you only do it once?
-It always happens
2.Does it take the same time when you save it to a different (local) drive?
-I will investigate this more when I get back to that computer
3.How long does it take to save a 500kb matrix to that folder?
-Almost instantaneous
4.And as asked above, what is the function call that you use?
-Code added below
(Image is a rgb image)
MaskedImage(:,:,1)=Image(:,:,1).*Mask;
MaskedImage(:,:,2)=Image(:,:,2).*Mask;
MaskedImage(:,:,3)=Image(:,:,3).*Mask;
MaskedImage=im2single(MaskedImage);
....
(I use some method to create a bounding box around my 16x20 image)
(this is in a loop of that occurs about 100-200 times)
Misfire=input('is this a misfire?','s');
if (strcmpi(Misfire,'yes'))
curImageReal=MaskedImage(j:j+Ybound,i:i+Xbound,:);
Training{curTrainingIndex}=curImageReal; %Training is a cell array of images
Labels{curTrainingIndex}='ncr';
curTrainingIndex=curTrainingIndex+1;
end
(the loop ends)...
SaveAndUpdate=input('Would you like to save this data?(say yes,definitely)','s');
undecided=1;
while(undecided)
if(strcmpi(SaveAndUpdate,'yes,definitely'))
AllData{curSVM,4}=Training;
AllData{curSVM,5}=Labels;
save(strcat(dirpath,'/',TrainingName),'AllData'); <---STUCK HERE
undecided=0;
else
DontSave=input('Im not going to save. Say YESNOSAVE to NOT SAVE','s')
if(strcmpi(DontSave,'yesnosave'))
undecided=0;
else
SaveAndUpdate=input('So... save? (say yes,definitely)','s');
end
end
end
It is a bit unclear if you are doing some custom file saving or not. If it is the first, I'm guessing that you have a really slow save loop going on, maybe some hardware issues. Try to save the data using MATLAB's save function:
tic
save('test.mat', 'AllData')
toc
if that works fine try to work your way from there e.g. saving one element at a time etc.
You can profile your code by using the profiler, open it with the command profile viewer and then type in the code, script or function that you want to profile in the input text field.
This isn't a great answer, but it seems that the problem was that I was saving the version of my image after I had converted it to a single. I don't know why this would cause such a dramatic slowdown (after removing this line of code it worked instantly) so if someone could edit my answer to shed more light on the situation, that would be appreciated.