I have a data set consisting of large number of .mat files. Each .mat file is of considerable size i.e. loading them is time-consuming. Unfortunately, some of them are corrupt and load('<name>') returns error on those files. I have implemented a try-catch routine to determine which files are corrupt. However, given the situation that only handful of them are corrupt, loading each file and checking if it is corrupt is time taking. Is there any way I can check the health of a .mat file without using load('<name>')?
I have been unsuccessful in finding such solution anywhere.
The matfile function is used to access variables in MAT-files, without loading them into memory. By changing your try-catch routine to use matfile instead of load, you reduce the overhead of loading the large files into the memory.
As matfile appears to only issue a warning when reading a corrupt file, you'll have to check if this warning was issued. This can be done using lastwarn: clear lastwarn before calling matfile, and check if the warning was issued afterwards:
lastwarn('');
matfile(...);
[~, warnId] = lastwarn;
if strcmp(warnId, 'relevantWarningId')
% File is corrupt
end
You will have to find out the relevant warning id first, by running the above code on a corrupt file, and saving the warnId.
A more robust solution would be to calculate a checksum or hash (e.g. MD5) of the file upon creation, and comparing this checksum before reading the file.
Related
I'm running a short code to open one by one a list of files and saving back only one of the variables contained in the files. The process seems to me much slower than I expected and getting slower with time, I don't fully understand why and how I could make it run faster. I always struggle with optimization. I'd appreciate if you have suggestions.
The code is the following (the ... substitute the actual path just for example):
main_dir=dir(strcat('\\storage2-...\Raw\DAQ5\'));
filename={};
for m=7:size(main_dir,1)
m
second_dir=dir([main_dir(m).folder '\' main_dir(m).name '\*.mat']);
for mm=1:numel(second_dir)
filename{end+1}=[second_dir(mm).folder '\' second_dir(mm).name];
for mmm=1:numel(filename)
namefile=sprintf(second_dir(mm,1).name);
load(string(filename(1,mmm)));
save(['\\storage2-...\DAQ5\Ch1_',namefile(end-18:end-4),'.mat'], 'Ch_1_y')
end
end
end
The original file is about 17 MB and once the single variable is saved it is about 6 MB in size.
The Matlab load function takes an optional additional argument to specify just a selected variable to read from the input file.
s = load('path/to/file.mat', 'Ch_1_y');
That way you don't have to spend time loading in all the other variables from those input .mat files that you're just going to immediately throw away.
And using save to save MAT-files over SMB shares can be slow. You might want to call save to write it to a temporary local file first, and then copy the completed file to the final destination. Sounds like more I/O, but it can actually be a net win, depending on your particular system and network. Measure it both ways to see if it's a win in your particular situation.
I was at the middle of a project and accidentally deleted one of my major folder which contains all my signals.
All of the signals was in .mat format and each of them has considerably large size.
After taking my laptop to support center, they recovered my file but nearly all my .mat files could not be read (all other type such as .m or simulink file are readable).
I checked various method, but all of them says the .mat "file might be corrupt" so I want to see
Is there specific method to recover my missed .mat files?
Is there any way I could fix the corrupted .mat file or some part of it?
I checked various method such as
loading .mat file in matlab
checking the file by matfile
try to read the ,.mat file by fopen and fread
using "splitmat" code on my .mat file , as mentioned here [link] (https://www.mathworks.com/matlabcentral/answers/98890-how-do-i-recover-data-from-a-corrupt-mat-file)
but all of them says the .mat "file might be corrupt"
I am using Matlab 2014a under Windows7. I am running a loop that reads very big xlsx files (~40MB each). After I am done with a file I use 'clear' in order to free the memory taken by reading the file. The thing is that every once in a while the script is stops and giving me an error message:
Error using xlsread (line 247)
Error: Not enough storage is available to complete this operation.
I want to emphasis that after each time I am finishing with a file I clear all the variables, so each iteration only one file is loaded. If I restart Matlab the script may work again - making me believe that some how 'clear' command doesn't free all the memory that was allocated. is there a way to really free the memory that once was allocated in matlab?
thank you very much
Ariel
If restarting Matlab is not an option, the "pack" function should help. Otherwise you could also use matlab without the gui and write a shell script that starts and matlab for each file.
I have a matrix cube which I load in my program to read data from. The size of this .mat file is 2.8 GB. I am not being able to load it with the error of 'running out of memory'. Is there a way to fix this?
You can use the matfile class to work on ranges within variables inside MatLab files. See
Load and save parts of variables in MAT-files
Here's some additional discussion that discloses that this feature is new with R2011b.
If the size of the data exceeds the available memory on your machine, then you are in trouble - this is unavoidable. However, if you only want certain variables inside the .mat file you can try to load just those variables using the
load(filename, variables)
version of the load function. It really depends on the contents of your .mat file. If the file is 2.8GB and you need ALL of the variables in the file and your machine does not have enough memory to cope, your only option is to buy more RAM.
EDIT Apparently this answer is incorrect if you are running R2011b and above as explained in the answer of Ben Voight
I was trying to save a matrix into a mat file, but the Matlab returns the following messages:
Warning: Variable 'listmatrix' cannot be saved to a MAT-file whose version is older than 7.3.
To save this variable, use the -v7.3 switch.
Skipping...
What does it mean for "use the -v7.3 switch"?
Should I use
save testresult.mat -v7.3 listmatrix
or sth else?
Hi i thought I’d reply to this thread as I’ve been trying to figure out how to save a large (>2 GB) .mat file in matlab v7 (v7.1.0.183) (R14) and finally found a solution.
If you try to use the save command you will get the following error:
save('test.mat', 'data');
Warning: Variable 'data' cannot be saved to a MAT-file because its
storage requirements exceed 2^31 bytes. This limitation will be
addressed in a future release. Consider storing this variable in HDF5
file format (see HDF5WRITE). Skipping...
The solution is to write a HDF5 file instead:
hdf5write('test.hdf5', '/dataset1', data);
You can then read the data back into matlab using:
hdf5read('test.hdf5', '/dataset1');
A quick google search says yes. Try
save -v7.3 testresult.mat listmatrix
How big is your object? (Do whos listmatrix)
You could potentially save memory by using different data type such as uint8.
http://www.mathworks.ch/matlabcentral/newsreader/view_thread/243327
http://www.mathworks.de/matlabcentral/newsreader/view_thread/307845