Check for existence of array of files - matlab

I have a cell array containing file names. I want to check for the existence of all of these files in the subject folder, and if any one does not exist I wish to send a continue to the top-most for-loop (see mock code). Is there a way to do this in a one or two liner, instead of 1) using a for-loop and a double if-statement, or 2) building a function that for-loops over exist().
subjects = {'/data/subject01','/data/subject02','/data/subject03'};
files = {'a.txt','b.txt','c.txt'};
for ii = 1:numel(subjects)
for jj = 1:numel(files)
fileExists = exist([subject{ii} '/' file{jj}],'file')
if ~fileExists
continue
end
end
if ~fileExists
continue
end
% Some code to execute if all files exist.
end

The *fun functions are just loops internally and are generally slower than the explicit loop. They also very often unnecessarily obfuscate the intent and behavior of the code.
You can use ismember with all and dir to make the approach clearer and remove the unnecessary loop:
subjects = {'./data/subject01','./data/subject02'};
files = {'a.txt','b.txt','c.txt'};
for ii = 1:numel(subjects)
filelist = dir(fullfile(subjects{ii}, '*.txt'));
foundfilenames = {filelist(:).name};
if all(ismember(files, foundfilenames))
fprintf('All %u files are here: %s\n', numel(files), subjects{ii})
else
fprintf('All %u files are not here: %s\n', numel(files), subjects{ii})
end
end
With my folder structure:
/data
/subject01
a.txt
b.txt
/subject02
a.txt
b.txt
c.txt
I see the following, as expected:
All 3 files are not here: ./data/subject01
All 3 files are here: ./data/subject02

You could remove the loop by iterating over all combinations of the two arrays:
subjects = {'/data/subject01','/data/subject02','/data/subject03'};
files = {'a.txt','b.txt','c.txt'};
a=numel(subjects);
b=numel(files);
k=a*b;
paths = arrayfun(#(ii)[subjects{mod(ii-1,a)+1} '/' files{ceil(ii/b)}],1:k,'uniformoutput',0);
checkExist = cellfun(#exist, paths, repmat({'file'},1,k));
if all(checkExist)
% Some code to execute if all files exist
end

Resolved it with a cellfun and by string arrays. Technically there is still a for-loop but it resolves the double if-statement. I will leave this question open for better solutions.
subjects = {'/data/subject01','/data/subject02','/data/subject03'};
files = string({'a.txt','b.txt','c.txt'});
for ii = 1:numel(subjects)
paths = subject{ii} + files;
checkExist = cellfun(#exist, cellstr(paths), repmat({'file'},size(paths))
if ~all(checkExist(:))
continue
end
% Some code to execute if all files exist.
end

Related

Matlab skipping subfolders

I have a folder containing sequential subfolders 000001_wd, 000002_wd,... in which I'm reading data contained in a file called 'plane.txt'. Some of the subfolders don't contain that file. I wish to skip them in a for-if else loop, but it is unable open file.
Tried changing or adding paths but nothing seems to work
workdir = 'D:\wass\test\output_925\';
cd(workdir)
data_frames = [1:1:37];
nframes = numel(data_frames);
V = zeros(nframes,3);
times = zeros(nframes,1);
ii=1;
prev = cd(workdir);
for frame = data_frames
fprintf('Processing frame %d\n',frame);
wdir = sprintf( '%s%06d_wd/', workdir, frame);
cd(wdir)
if exist('plane.txt')
plane_data = importdata([wdir,'plane.txt']);
times(ii) = double(ii-1)/fps;
else
times(ii) = double(ii-1)/fps;
end
ii=ii+1;
end
cd(prev);
fprintf('Saving data...\n');
I want to just continue through the loop until the last subfolder. Is there something I'm missing because the file I'm skipping is in a subfolder of my sequence?
The statement exist('plane.txt') tests to see if the file 'plane.txt' exists in the current directory. If it does, you read a file in the wdir subdirectory. Obviously, you haven't tested if that file exists.
I would simplify your code by reading the data within a try/catch block:
workdir = 'D:\wass\test\output_925\';
data_frames = 1:37; % <- don't use square brackets here, they're useless
nframes = numel(data_frames);
times = zeros(nframes,1);
for ii=1:nframes
frame = data_frames(ii);
fprintf('Processing frame %d\n',frame);
wdir = sprintf( '%s%06d_wd/', workdir, frame);
try
plane_data = importdata([wdir,'plane.txt']);
% do something with plane_data here...
catch
% ignore error
end
times(ii) = double(ii-1)/fps;
end
% ...
Note that I never used cd. You don't need to change directories to read data, and it's always better not to. The importdata statement uses an absolute path, so it does not matter what the current directory is.
A different approach involves getting a list of all files that match 'D:\wass\test\output_925\*\plane.txt':
files = dir(fullfile(workdir, '*', 'plane.txt'));
for ii=1:numel(files)
file = fullfile(files(ii).folder, files(ii).name);
plane_data = importdata(file);
% do something with plane_data here...
end

MATLAB dir without '.' and '..'

the function dir returns an array like
.
..
Folder1
Folder2
and every time I have to get rid of the first 2 items, with methods like :
for i=1:numel(folders)
foldername = folders(i).name;
if foldername(1) == '.' % do nothing
continue;
end
do_something(foldername)
end
and with nested loops it can result in a lot of repeated code.
So can I avoid these "folders" by an easier way?
Thanks for any help!
Edit:
Lately I have been dealing with this issue more simply, like this :
for i=3:numel(folders)
do_something(folders(i).name)
end
simply disregarding the first two items.
BUT, pay attention to #Jubobs' answer. Be careful for folder names that start with a nasty character that have a smaller ASCII value than .. Then the second method will fail. Also, if it starts with a ., then the first method will fail :)
So either make sure you have nice folder names and use one of my simple solutions, or use #Jubobs' solution to make sure.
A loop-less solution:
d=dir;
d=d(~ismember({d.name},{'.','..'}));
TL; DR
Scroll to the bottom of my answer for a function that lists directory contents except . and ...
Detailed answer
The . and .. entries correspond to the current folder and the parent folder, respectively. In *nix shells, you can use commands like ls -lA to list everything but . and ... Sadly, MATLAB's dir doesn't offer this functionality.
However, all is not lost. The elements of the output struct array returned by the dir function are actually ordered in lexicographical order based on the name field. This means that, if your current MATLAB folder contains files/folders that start by any character of ASCII code point smaller than that of the full stop (46, in decimal), then . and .. willl not correspond to the first two elements of that struct array.
Here is an illustrative example: if your current MATLAB folder has the following structure (!hello and 'world being either files or folders),
.
├── !hello
└── 'world
then you get this
>> f = dir;
>> for k = 1 : length(f), disp(f(k).name), end
!hello
'world
.
..
Why are . and .. not the first two entries, here? Because both the exclamation point and the single quote have smaller code points (33 and 39, in decimal, resp.) than that of the full stop (46, in decimal).
I refer you to this ASCII table for an exhaustive list of the visible characters that have an ASCII code point smaller than that of the full stop; note that not all of them are necessarily legal filename characters, though.
A custom dir function that does not list . and ..
Right after invoking dir, you can always get rid of the two offending entries from the struct array before manipulating it. Moreover, for convenience, if you want to save yourself some mental overhead, you can always write a custom dir function that does what you want:
function listing = dir2(varargin)
if nargin == 0
name = '.';
elseif nargin == 1
name = varargin{1};
else
error('Too many input arguments.')
end
listing = dir(name);
inds = [];
n = 0;
k = 1;
while n < 2 && k <= length(listing)
if any(strcmp(listing(k).name, {'.', '..'}))
inds(end + 1) = k;
n = n + 1;
end
k = k + 1;
end
listing(inds) = [];
Test
Assuming the same directory structure as before, you get the following:
>> f = dir2;
>> for k = 1 : length(f), disp(f(k).name), end
!hello
'world
a similar solution from the one suggested by Tal is:
listing = dir(directoryname);
listing(1:2)=[]; % here you erase these . and .. items from listing
It has the advantage to use a very common trick in Matlab, but assumes that you know that the first two items of listing are . and .. (which you do in this case). Whereas the solution provided by Tal (which I did not try though) seems to find the . and .. items even if they are not placed at the first two positions within listing.
Hope that helps ;)
If you're just using dir to get a list of files and and directories, you can use Matlab's ls function instead. On UNIX systems, this just returns the output of the shell's ls command, which may be faster than calling dir. The . and .. directories won't be displayed (unless your shell is set up to do so). Also, note that the behavior of this function is different between UNIX and Windows systems.
If you still want to use dir, and you test each file name explicitly, as in your example, it's a good idea to use strcmp (or one of its relations) instead of == to compare strings. The following would skip all hidden files and folder on UNIX systems:
listing = dir;
for i = 1:length(listing)
if ~strcmp(listing(i).name(1),'.')
% Do something
...
end
end
You may also wanna exclude any other files besides removing dots
d = dir('/path/to/parent/folder')
d(1:2)=[]; % removing dots
d = d([d.isdir]) % [d.isdir] returns a logical array of 1s representing folders and 0s for other entries
I used: a = dir(folderPath);
Then used two short code that return struct:
my_isdir = a([a.isdir]) Get a struct which only has folder info
my_notdir = a(~[a.isdir]) Get a struct which only has non-folder info
Combining #jubobs and #Tal solutions:
function d = dir2(folderPath)
% DIR2 lists the files in folderPath ignoring the '.' and '..' paths.
if nargin<1; folderPath = '.'; elseif nargin == 1
d = dir(folderPath);
d = d(~ismember({d.name},{'.','..'}));
end
None of the above puts together the elements as I see the question having being asked - obtain a list only of directories, while excluding the parents.
Just combining the elements, I would go with:
function d = dirsonly(folderPath)
% dirsonly lists the unhidden directories in folderPath ignoring '.' and '..'
% creating a simple cell array without the rest of the dir struct information
if nargin<1; folderPath = '.'; elseif nargin == 1
d = dir(folderPath);
d = {d([d.isdir] & [~ismember({d.name},{'.','..'})]).name}.';
end
if hidden folders in general aren't wanted the ismember line could be replaced with:
d = {d([d.isdir] & [~strncmp({d.name},'.',1)]).name}.';
if there were very very large numbers of interfering non-directory files it might be more efficient to separate the steps:
d = d([d.isdir]);
d = {d([~strncmp({d.name},'.',1)]).name}.';
We can use the function startsWith
folders = dir("folderPath");
folders = string({folders.name});
folders = folders(~startsWith(folders,"."))
Potential Solution - just remove the fields
Files = dir;
FilesNew = Files(3:end);
You can just remove them as they are the first two "files" in the structure
Or if you are actually looking for specific file types:
Files = dir('*.mat');

Reconstruct directories from file MATLAB

Thanks for your help.
The problem is:
I need the user to select a file based on an extension lets say .tif. I used the standard method, i.e.
[flnm,locn]=uigetfile({'*.tif','Image files'}, 'Select an image');
ext = '.tif';
But I need to fetch other image files from other subdirectories. Say the directory name returned to locn is: /user/blade/checklist/exp1/trial_1/run_1/exp001.tif. Image goes to exp100.tif.
I want to access:
/user/blade/checklist/exp1/trial_1/run_2/exp001.tif.
Also access:
/user/blade/checklist/exp1/trial_2/run_2/exp001.tif.
Up to trial_n
But if I list directory in /user/blade/checklist/exp1/, I get all folders therein from where I can reconstruct the right path. The naming structure is orderly.
My current solution is
[flnm,locn]=uigetfile({'*.tif','Image files'}, 'Select an image');
ext = '.tif';
parts = strsplit(locn, '/');
f = fullfile(((parts{end-5}),(parts{end-4}),(parts{end-3}),(parts{end-2}),(parts{end-1}));
Which is really ugly and I also lose the first /. Any help is appreciated.
Thanks!
First, get the file location as you did; note a small change I've made to make use of the variable ext.
ext = '.txt';
[flnm,locn]=uigetfile({['*',ext]}, 'Select an image');
parts = strsplit(locn,'/');
root = parts(1:end-4);
parts has 2 information - 1) path of the selected file; 2) path of your working folder, checklist, which you need. So root has the working folder.
Then, list out all the files you wanted, and put them in a cell array.
The file names should contain partial (subfolder) paths; it's not difficult to follow the pattern.
flist = {'trial_1/run_1/exp001.tif', ...
'trial_1/run_1/exp002.tif', ...
'trial_1/run_2/exp001.tif', ...
'trial_2/run_1/exp001.tif', ...
'trial_2/run_2/exp001.tif'};
I just enumerated a few; you can use a for loop to automatically generate trial_n and expxxx.tif. An example code to generate the complete file list (but not "full paths") -
flist = cell(10*2*100,1);
for ii = 1:10
for jj = 1:2
for kk = 1:100
flist{sub2ind([10,2,100],ii,jj,kk)} = ...
sprintf('trial_%d/run_%d/exp%03d%s', ii,...
jj, kk, ext);
end
end
end
Finally, use strjoin to concatenate the first part (your working folder) and second part (needed files in subfolders). Use cellfun to call strjoin for each cell in the file list cell array, so for every file you want you get a full path.
full_flist = cellfun(#(x) strjoin([root, x],'/'), ...
flist, 'UniformOutput', false);
Example output -
>> locn
locn =
/home/user/Downloads/exp1/trial_1/run_1/
>> for ii = 1:5
full_flist{ii}
end
ans =
/home/user/Downloads/trial_1/run_1/exp001.tif
ans =
/home/user/Downloads/trial_1/run_1/exp002.tif
ans =
/home/user/Downloads/trial_1/run_2/exp001.tif
ans =
/home/user/Downloads/trial_2/run_1/exp001.tif
ans =
/home/user/Downloads/trial_2/run_2/exp001.tif
>>
Note: You can either use
strjoin(str1, str2, '/')
or
sprintf('%s/%s', str1, str2)
They are equivalent.

dir(myFiles{m}) does't work MATLAB

myFiles = 1x7 cell
when I try
for m =1:numel(myFiles )
fil{m} = dir(myFiles {m});
fil{m}.bytes ;
end
This is not working
I got the error :
function is not defined for 'cell' inputs.
First of all you should mention the error message you get.
Now, besides that are some obvious problems:
myFiles {ii}
This is not valid syntax to index into a cell array. Perhaps removing the space helps.
Furthermore you loop over m and then use ii as an index.
Lastly you assign to fil everytime. In practice this means only the last result is stored. Perhaps assigning to fil(m) would suit your needs better.
The command dir will show you the content of a folder. As your variable is named "myFiles" I assume it contains filenames and not foldernames. So I think you're rather looking for a loop like this:
for ii = 1:numel(myFiles)
fil{ii} = which( myFiles{ii} )
end
which gives you an array with the full paths to your files. Or are you looking for the folders containing the files in "myFiles"? Then you can use:
for ii = 1:numel(myFiles)
fil{ii} = fileparts( which( myFiles{ii} ) )
end
returning you the corresponding folders.
regarding your comments:
the existence of the files/folders in "myFiles" is the only purpose?
Then you could do that:
for ii = 1:numel(myFiles)
fil(ii) = exist( which(myFiles{ii}), 'file' );
end
existMyFiles = logical(fil);
returning a logical array specifying the existence of your files.

Load Multiple .mat Files to Matlab workspace

I'm trying to load several .mat files to the workspace. However, they seem to overwrite each other. Instead, I want them to append. I am aware that I can do something like:
S=load(file1)
R=load(file2)
etc.
and then append the variables manually.
But there's a ton of variables, and making an append statement for each one is extremely undesirable (though possible as a last resort). Is there some way for me to load .mat files to the workspace (by using the load() command without assignment) and have them append?
Its not entirely clear what you mean by "append" but here's a way to get the data loaded into a format that should be easy to deal with:
file_list = {'file1';'file2';...};
for file = file_list'
loaded.(char(file)) = load(file);
end
This makes use of dynamic field references to load the contents of each file in the list into its own field of the loaded structure. You can iterate over the fields and manipulate the data however you'd like from here.
It sounds like you have a situation in which each file contains a matrix variable A and you want to load into memory the concatenation of all these matrices along some dimension. I had a similar need, and wrote the following function to handle it.
function var = loadCat( dim, files, varname )
%LOADCAT Concatenate variables of same name appearing in multiple MAT files
%
% where dim is dimension to concatenate along,
% files is a cell array of file names, and
% varname is a string containing the name of the desired variable
if( isempty( files ) )
var = [];
return;
end
var = load( files{1}, varname );
var = var.(varname);
for f = 2:numel(files),
newvar = load( files{f}, varname );
if( isfield( newvar, varname ) )
var = cat( dim, var, newvar.(varname) );
else
warning( 'loadCat:missingvar', [ 'File ' files{f} ' does not contain variable ' varname ] );
end
end
end
Clark's answer and function actually solved my situation perfectly... I just added the following bit of code to make it a little less tedious. Just add this to the beginning and get rid of the "files" argument:
[files,pathname] = uigetfile('*.mat', 'Select MAT files (use CTRL/COMM or SHIFT)', ...
'MultiSelect', 'on');
Alternatively, it could be even more efficient to just start with this bit:
[pathname] = uigetdir('C:\');
files = dir( fullfile(pathname,'*.mat') ); %# list all *.mat files
files = {files.name}'; %# file names
data = cell(numel(files),1); %# store file contents
for i=1:numel(files)
fname = fullfile(pathname,files{i}); %# full path to file
data{i} = load(fname); %# load file
end
(modified from process a list of files with a specific extension name in matlab).
Thanks,
Jason