deleting variables from a .mat file - matlab

Does anyone here know how to delete a variable from a matlab file? I know that you can add variables to an existing matlab file using the save -append method, but there's no documentation on how to delete variables from the file.
Before someone says, "just save it", its because I'm saving intermediate processing steps to disk to alleviate memory problems, and in the end there will be almost 10 GB of intermediate data per analysis routine. Thanks!

Interestingly enough, you can use the -append option with SAVE to effectively erase data from a .mat file. Note this excerpt from the documentation (bold added by me):
For MAT-files, -append adds new variables to the file or replaces the saved values of existing variables with values in the workspace.
In other words, if a variable in your .mat file is called A, you can save over that variable with a new copy of A (that you've set to []) using the -append option. There will still be a variable called A in the .mat file, but it will be empty and thus reduce the total file size.
Here's an example:
>> A = rand(1000); %# Create a 1000-by-1000 matrix of random values
>> save('savetest.mat','A'); %# Save A to a file
>> whos -file savetest.mat %# Look at the .mat file contents
Name Size Bytes Class Attributes
A 1000x1000 8000000 double
The file size will be about 7.21 MB. Now do this:
>> A = []; %# Set the variable A to empty
>> save('savetest.mat','A','-append'); %# Overwrite A in the file
>> whos -file savetest.mat %# Look at the .mat file contents
Name Size Bytes Class Attributes
A 0x0 0 double
And now the file size will be around 169 bytes. The variable is still in there, but it is empty.

10 GB of data? Updating multi-variable MAT files could get expensive due to MAT format overhead. Consider splitting the data up and saving each variable to a different MAT file, using directories for organization if necessary. Even if you had a convenient function to delete variables from a MAT file, it would be inefficient. The variables in a MAT file are layed out contiguously, so replacing one variable can require reading and writing much of the rest. If they're in separate files, you can just delete the whole file, which is fast.
To see this in action, try this code, stepping through it in the debugger while using something like Process Explorer (on Windows) to monitor its I/O activity.
function replace_vars_in_matfile
x = 1;
% Random dummy data; zeros would compress really well and throw off results
y = randi(intmax('uint8')-1, 100*(2^20), 1, 'uint8');
tic; save test.mat x y; toc;
x = 2;
tic; save -append test.mat x; toc;
y = y + 1;
tic; save -append test.mat y; toc;
On my machine, the results look like this. (Read and Write are cumulative, Time is per operation.)
Read (MB) Write (MB) Time (sec)
before any write: 25 0
first write: 25 105 3.7
append x: 235 315 3.6
append y: 235 420 3.8
Notice that updating the small x variable is more expensive than updating the large y. Much of this I/O activity is "redundant" housekeeping work to keep the MAT file format organized, and will go away if each variable is in its own file.
Also, try to keep these files on the local filesystem; it'll be a lot faster than network drives. If they need to go on a network drive, consider doing the save() and load() on local temp files (maybe chosen with tempname()) and then copying them to/from the network drive. Matlab's save and load tend to be much faster with local filesystems, enough so that local save/load plus a copy can be a substantial net win.
Here's a basic implementation that will let you save variables to separate files using the familiar save() and load() signatures. They're prefixed with "d" to indicate they're the directory-based versions. They use some tricks with evalin() and assignin(), so I thought it would be worth posting the full code.
function dsave(file, varargin)
%DSAVE Like save, but each var in its own file
%
% dsave filename var1 var2 var3...
if nargin < 1 || isempty(file); file = 'matlab'; end
[tfStruct,loc] = ismember({'-struct'}, varargin);
args = varargin;
args(loc(tfStruct)) = [];
if ~all(cellfun(#isvarname, args))
error('Invalid arguments. Usage: dsave filename <-struct> var1 var2 var3 ...');
end
if tfStruct
structVarName = args{1};
s = evalin('caller', structVarName);
else
varNames = args;
if isempty(args)
w = evalin('caller','whos');
varNames = { w.name };
end
captureExpr = ['struct(' ...
join(',', cellfun(#(x){sprintf('''%s'',{%s}',x,x)}, varNames)) ')'];
s = evalin('caller', captureExpr);
end
% Use Java checks to avoid partial path ambiguity
jFile = java.io.File(file);
if ~jFile.exists()
ok = mkdir(file);
if ~ok;
error('failed creating dsave dir %s', file);
end
elseif ~jFile.isDirectory()
error('Cannot save: destination exists but is not a dir: %s', file);
end
names = fieldnames(s);
for i = 1:numel(names)
varFile = fullfile(file, [names{i} '.mat']);
varStruct = struct(names{i}, {s.(names{i})});
save(varFile, '-struct', 'varStruct');
end
function out = join(Glue, Strings)
Strings = cellstr(Strings);
if length( Strings ) == 0
out = '';
elseif length( Strings ) == 1
out = Strings{1};
else
Glue = sprintf( Glue ); % Support escape sequences
out = strcat( Strings(1:end-1), { Glue } );
out = [ out{:} Strings{end} ];
end
Here's the load() equivalent.
function out = dload(file,varargin)
%DLOAD Like load, but each var in its own file
if nargin < 1 || isempty(file); file = 'matlab'; end
varNames = varargin;
if ~exist(file, 'dir')
error('Not a dsave dir: %s', file);
end
if isempty(varNames)
d = dir(file);
varNames = regexprep(setdiff(ls(file), {'.','..'}), '\.mat$', '');
end
out = struct;
for i = 1:numel(varNames)
name = varNames{i};
tmp = load(fullfile(file, [name '.mat']));
out.(name) = tmp.(name);
end
if nargout == 0
for i = 1:numel(varNames)
assignin('caller', varNames{i}, out.(varNames{i}));
end
clear out
end
Dwhos() is the equivalent of whos('-file').
function out = dwhos(file)
%DWHOS List variable names in a dsave dir
if nargin < 1 || isempty(file); file = 'matlab'; end
out = regexprep(setdiff(ls(file), {'.','..'}), '\.mat$', '');
And ddelete() to delete the individual variables like you asked.
function ddelete(file,varargin)
%DDELETE Delete variables from a dsave dir
if nargin < 1 || isempty(file); file = 'matlab'; end
varNames = varargin;
for i = 1:numel(varNames)
delete(fullfile(file, [varNames{i} '.mat']));
end

The only way of doing this that I know is to use the MAT-file API function matDeleteVariable. It would, I guess, be quite easy to write a Fortran or C routine to do this, but it does seem like a lot of effort for something that ought to be much easier.

I suggest you load the variables from the .mat file you want to keep, and save them to a new .mat file. If necessary, you can load and save (using '-append') in a loop.
S = load(filename, '-mat', variablesYouWantToKeep);
save(newFilename,'-struct',S,variablesYouWantToKeep);
%# then you can delete the old file
delete(filename)

Related

Save the data in a form of three columns in text files

This function reads the data from multiple mat files and save them in multiple txt files. But the data (each value) are saved one value in one column and so on. I want to save the data in a form of three columns (coordinates) in the text files, so each row has three values separated by space. Reshape the data before i save them in a text file doesn't work. I know that dlmwrite should be modified in away to make newline after three values but how?
mat = dir('*.mat');
for q = 1:length(mat)
load(mat(q).name);
[~, testName, ~] = fileparts(mat(q).name);
testVar = eval(testName);
pos(q,:,:) = testVar.Bodies.Positions(1,:,:);
%pos=reshape(pos,2,3,2000);
filename = sprintf('data%d.txt', q);
dlmwrite(filename , pos(q,:,:), 'delimiter','\t','newline','pc')
end
My data structure:
These data should be extracted from each mat file and stored in the corresponding text files like this:
332.68 42.76 42.663 3.0737
332.69 42.746 42.655 3.0739
332.69 42.75 42.665 3.074
A TheMathWorks-trainer once told me that there is almost never a good reason nor a need to use eval. Here's a snippet of code that should solve your writing problem using writematrix since dlmwrite is considered to be deprecated.
It further puts the file-handling/loading on a more resilient base. One can access structs dynamically with the .(FILENAME) notation. This is quite convenient if you know your fields. With who one can list variables in the workspace but also in .mat-files!
Have a look:
% path to folder
pFldr = pwd;
% get a list of all mat-files (returns an array of structs)
Lst = dir( fullfile(pFldr,'*.mat') );
% loop over files
for Fl = Lst.'
% create path to file
pFl = fullfile( Fl.folder, Fl.name );
% variable to load
[~, var2load, ~] = fileparts(Fl.name);
% get names of variables inside the file
varInfo = who('-file',pFl);
% check if it contains the desired variables
if ~all( ismember(var2load,varInfo) )
% display some kind of warning/info
disp(strcat("the file ",Fl.name," does not contain all required varibales and is therefore skipped."))
% skip / continue with loop
continue
end
% load | NO NEED TO USE eval()
Dat = load(pFl, var2load);
% DO WHATEVER YOU WANT TO DO
pos = squeeze( Dat.(var2load)(1,:,1:2000) );
% create file name for text file
pFl2save = fullfile( Fl.folder, strrep(Fl.name,'.mat','.txt') );
writematrix(pos,pFl2save,'Delimiter','\t')
end
To get your 3D-matrix data into a 2D matrix that you can write nicely to a file, use the function squeeze. It gets rid of empty dimensions (in your case, the first dimension) and squeezes the data into a lower-dimensional matrix
Why don't you use writematrix() function?
mat = dir('*.mat');
for q = 1:length(mat)
load(mat(q).name);
[~, testName, ~] = fileparts(mat(q).name);
testVar = eval(testName);
pos(q,:,:) = testVar(1,:,1:2000);
filename = sprintf('data%d.txt', q);
writematrix(pos(q,:,:),filename,'Delimiter','space');
end
More insight you can find here:
https://www.mathworks.com/help/matlab/ref/writematrix.html

how to run multiple text file in matlab and get separate output file?

Hi I am having multiple text file name ABR1.txt,ABR2.txt,....ABR1000.txt
I have wrote down matlab code for distance calculation. So I want that all the file present in the current folder should run in this code and provide separate output file. So I am trying but only one ABR1.txt is giving output. Please check it and let me know what i can do?
clc
clear all
for n=1:2
filename = ['ABR', int2str(n), '.txt'];
Pop=load(filename);
[m n] = size(Pop);
n = m;
Dist = zeros(m, n);
for i = 1 : m
for j = 1 : n
Dist(i, j) = sqrt((Pop(i, 1) - Pop(j, 1)) ^ 2 + ...
(Pop(i, 2) - Pop(j, 2)) ^ 2);
end
end
Dist
q=(1-(3/8)*Dist)
filename = ['ABRa', int2str(n), '.txt'];
save(filename, 'q', '-ascii');
end
You are using the variable "n" both as your loop/file counter, and as the size of the matrix you read. I don't know what the data you're loading looks like, but if "n" ends up at a value of 1, then you'll only ever write the file name with "1" in it. You may actually be writing it twice, but both times have the same file name.
I would start by changing your for loop to use some other variable name:
for iFile = 1:2
Then also change it in the file name:
filename = ['ABRa', int2str(iFile), '.txt'];
Also, if you're really trying to read 1000 files, you should run your loop 1000 times, not twice:
for iFile = 1:1000

Read a value from multiple text files and write all into another file

I have a list of sub-folders and each sub-folder has a text file name called simass.txt. From each of the simass.txt files I extract a c{1}{2,3} cell data (as done in the code below) and write it to a file name features.txt in sequential form in single column.
I am facing a problem where at the end I only have a single value in features.txt, which I believe is due to the values being overwritten. I'm supposed to have 1000 values in features.txt since I have 1000 sub-folders.
What am I doing wrong?
clc; % Clear the command window.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
% Define a starting folder wherever you want
start_path = fullfile(matlabroot, 'D:\Tools\Parameter Generation\');
% Ask user to confirm or change.
topLevelFolder = uigetdir(start_path);
if topLevelFolder == 0
return;
end
% Get list of all subfolders.
allSubFolders = genpath(topLevelFolder);
% Parse into a cell array.
remain = allSubFolders;
listOfFolderNames = {};
while true
[singleSubFolder, remain] = strtok(remain, ';');
if isempty(singleSubFolder)
break;
end
listOfFolderNames = [listOfFolderNames singleSubFolder];
end
numberOfFolders = length(listOfFolderNames)
% Process all text files in those folders.
for k = 1 : numberOfFolders
% Get this folder and print it out.
thisFolder = listOfFolderNames{k};
fprintf('Processing folder %s\n', thisFolder);
% Get filenames of all TXT files.
filePattern = sprintf('%s/simass.txt', thisFolder);
baseFileNames = dir(filePattern);
numberOfFiles = length(baseFileNames);
% Now we have a list of all text files in this folder.
if numberOfFiles >= 1
% Go through all those text files.
for f = 1 : numberOfFiles
fullFileName = fullfile(thisFolder, baseFileNames(f).name);
fileID=fopen(fullFileName);
c=textscan(fileID,'%s%s%s','Headerlines',10,'Collectoutput',true);
fclose(fileID);
%celldisp(c) % display all cell values
cellvalue=c{1}{2,3}
filePh = fopen('features.txt','w');
fprintf(filePh,cellvalue);
fclose(filePh);
fprintf(' Processing text file %s\n', fullFileName);
end
else
fprintf(' Folder %s has no text files in it.\n', thisFolder);
end
end
The problem is in the permission you use in fopen. From the documentation:
'w' - Open or create new file for writing. Discard existing contents, if any.
Which means you're discarding the contents every time, and you end up only having the last value. The fastest fix would be changing the permission to 'a', but I would suggest adding some changes to the code as follows:
Creating cellvalue before the loop, and read c{1}{2,3} into this new vector/cell array.
Perform the writing operation once, after cellvalue is fully populated.
cellvalue = cell(numberOfFiles,1);
for f = 1 : numberOfFiles
...
cellvalue{f} = c{1}{2,3};
end
fopen(...);
fprintf(...);
fclose(...);

Matlab Input/output of Several Files

I do matlab operation with two data file whose entries are complex numbers. For example,
fName = '1corr.txt';
f = dlmread('1EA.txt',',');
fid = fopen(fName);
tline = '';
Then I do matrix and other operations between these two files and write my output which I call 'modTrace' as:
modTrace
fileID = fopen('1out.txt','w');
v = [(0:(numel(modTrace)-1)).' real(modTrace(:)) ].';
fprintf(fileID,'%d %e\n',v);
The question is, if I have for example 100 pairs of such data files, like (2corr.txt, 2EA.txt), ....(50corr.txt, 50EA.txt) how can I generalize the input files and how to write all the output files at a time?
First of all, use sprintf to get your variable names depending on the current index.
corrName=sprintf('%dcorr.txt',idx);
EAName=sprintf('%dEA.txt',idx);
outName=sprintf('%dout.txt',idx);
This way, you have one variable (idx) which has to be changed.
Finally put everything into a loop:
n=100
for idx=1:n
corrName=sprintf('%dcorr.txt',idx);
EAName=sprintf('%dEA.txt',idx);
outName=sprintf('%dout.txt',idx);
f = dlmread(EAName,',');
fid = fopen(corrName);
tline = '';
modTrace
fileID = fopen(outName,'w');
v = [(0:(numel(modTrace)-1)).' real(modTrace(:)) ].';
fprintf(fileID,'%d %e\n',v);
end
Instead of hardcoding the number 100, you could also use n=numel(dir('*EA.txt')). It count's the files ending with EA.txt

How to read numbered sequence of .dat files into MATLAB

I am trying to load a numbered sequence of ".dat" named in the form a01.dat, a02.dat... a51.dat into MATLAB. I used the eval() function with the code below.
%% To load each ".dat" file for the 51 attributes to an array.
a = dir('*.dat');
for i = 1:length(a)
eval(['load ' a(i).name ' -ascii']);
end
attributes = length(a);
I ran into problems with that as I could not easily manipulate the data loaded with the eval function. And I found out the community is strongly against using eval. I used the csvread() with the code below.
% Scan folder for number of ".dat" files
datfiles = dir('*.dat');
% Count Number of ".dat" files
numfiles = length(datfiles);
% Read files in to MATLAB
for i = 1:1:numfiles
A{i} = csvread(datfiles(i).name);
end
The csvread() works for me but it reads the files but messes up the order when it reads the files. It reads a01.dat first and then a10.dat and a11.dat and so on instead of a01.dat, a02.dat... The contents of each files are signed numbers. Some are comma-delimited and single column and this is an even split. So a01.dat's contents are comma-delimited and a02.dat's content are in a single column.
Please how do I handle this?
Your problem seems to be sorting of the files. Drawing on a question on mathworks, this should help you:
datfiles = dir('*.mat');
name = {datfiles.name};
[~, index] = sort(name);
name = name(index);
And then you can loop with just name:
% Read files in to MATLAB
for i = 1:1:numfiles
A{i} = csvread(name{i});
end