Finding pair of files in a folder - matlab

I have a folder with tens of thousand of files. Every file in the folder should have a pair matching except the initial few letter, for example:
X_Date_Time_Place.dat
Y_Date_Time_Place.dat
Each X_* and Y_* combine to make one pair of files.
However, there always be some thousand of files extra which need to be eliminated from the folder. The extra files are also of the same type but without pair. For example, there may be more 'X_Date_Time_Place.dat' then 'Y_Date_Time_Place.dat'. Only variables in the file names are 'Date', 'Time' and 'Place'.
I have written a simple script (using for loop) that takes the name of one file and check all the other files in a loop until it finds its match. However, it is taking enormous amount of time to find a pair.
Is there any faster and more efficient way to do it?

You can split to two lists:
xlist = dir( fullfile( path_to_folder, 'X_*.dat') );
ylist = dir( fullfile( path_to_folder, 'Y_*.dat') );
%// remove prefixes
xlist = cellfun(#(x) x(3:end), {xlist.name}, 'uni', false);
ylist = cellfun(#(y) y(3:end), {ylist.name}, 'uni', false);
common = intersect(xlist, ylist);
Using intersect to find the common suffixes leaves you with common holding all Date_Time_Place.dat for which you have BOTH X_Date_Time_Place.dat and Y_Date_Time_Place.dat.
To get all pairs:
allParis = cellfun(#(c) {fullfile(path_to_folder,['X_',c]),
fullfile(path_to_folder,['Y_',c])}, common, 'uni', false);

You can use the function dir and specify a string and/or an extension that you want your filename to contain :
In your example :
I=dir('* _Date_Time_Place *.dat')
Will return a struct I whose fields will be all the filenames containing the string *_Date_Time_Place* and having the extension .dat .
You can then access to the elements in the struct with calls to I(1), I(2).
Minor note :
For this to work, your current folder must be the one where your files are.

Well, I don't have 10,000 files formatted like this but here is what I would do.
Xfiles = dir('X*.dat');
filenames = {Xfiles.name};
% Here I would determine how many pairs I am looking for (the unique X's)
% I am assuming that your X files are unique.
% remove the "X" from the file name
filenames2 = cellfun(#(x) repexprep(x, 'X',''));
keys = filenames2;
values = 1:length(filenames2);
fileMap = containers.Map(keys, values);
% for each Y look for the filename
Yfiles = dir('Y*.dat');
Yfiles2 = cellfun(#(x) repexprep(x, 'Y',''));
pairs = cell(length(Yfiles2),2);
% this assumes that for every Y there must be an X
% if this is not true then handle the empty idx case.
for x = 1:length(Yfiles2)
idx = fileMap(Yfiles2{x});
pairs(x,:) = {Xfiles(idx), Yfiles(idx)};
end

Related

How to use file name to sort and count files stored in a .mat file?

dbhole.mat file contains files name like: d1h1,d1h2,d1h3,d1h4,d2h1,d2h2,d3h1,d3h2,d3h4,d3h5,d3h6.
I want to count the number of files having a name that starts with d1 then d2 ,d3 and so on in a loop.
If you mean that you want to get a list of the variables in a *.mat file that start with d1, d2, etc. You could use who and matfile to get a list of all variables. who accepts a regular expression which you can create specific to the variables you want to see.
matobj = matfile('filename.mat');
d1vars = who(matobj, '-regexp', '^d1h');
nD1 = numel(d1vars);
Or more generally in a loop
for k = 1:3
vars{k} = who(matobj, '-regexp', ['^d', num2str(k), 'h']);
% And get the number
nVars(k) = numel(vars{k});
end
If you have an older version of MATLAB, you can load the file into a struct and then check the fields of that struct for the pattern that you'd like.
data = load('filename.mat');
variables = fieldnames(data);
isd1 = variables(~cellfun(#isempty, regexp(variables, '^d1h')));
nD1 = numel(isd1);

MATLAB - create a list for multiple files

I dont know if the title is appropriate, but i need to import several files e.g. 25 (files like info.asd , ina.asd, sdd.asd etc). So in my opinion its possible to import them via a for loop instead of hardcoding the operation. Any ideas how to implement the list in matlab, so the software 'd know what to import?
You can do it without loop with this function. sPath is the path containing your files and sExt is the extension of the files you want to list.
function cList = fileList(sPath, sExt)
if nargin == 1
sExt = '.asd';
end
% List files in the given path
stDir = dir(sPath);
tDir = struct2table(stDir);
tFile = tDir(~tDir.isdir, :);
% Keep only file with the right extension
cList = tFile.name;
[~, cList, cExt] = cellfun(#fileparts , ...
cList , ...
'UniformOutput', false);
vIsIni = cellfun(#(x) strcmpi(x, sExt), cExt);
cList = cList(vIsIni);
end

MATLAB dir without '.' and '..'

the function dir returns an array like
.
..
Folder1
Folder2
and every time I have to get rid of the first 2 items, with methods like :
for i=1:numel(folders)
foldername = folders(i).name;
if foldername(1) == '.' % do nothing
continue;
end
do_something(foldername)
end
and with nested loops it can result in a lot of repeated code.
So can I avoid these "folders" by an easier way?
Thanks for any help!
Edit:
Lately I have been dealing with this issue more simply, like this :
for i=3:numel(folders)
do_something(folders(i).name)
end
simply disregarding the first two items.
BUT, pay attention to #Jubobs' answer. Be careful for folder names that start with a nasty character that have a smaller ASCII value than .. Then the second method will fail. Also, if it starts with a ., then the first method will fail :)
So either make sure you have nice folder names and use one of my simple solutions, or use #Jubobs' solution to make sure.
A loop-less solution:
d=dir;
d=d(~ismember({d.name},{'.','..'}));
TL; DR
Scroll to the bottom of my answer for a function that lists directory contents except . and ...
Detailed answer
The . and .. entries correspond to the current folder and the parent folder, respectively. In *nix shells, you can use commands like ls -lA to list everything but . and ... Sadly, MATLAB's dir doesn't offer this functionality.
However, all is not lost. The elements of the output struct array returned by the dir function are actually ordered in lexicographical order based on the name field. This means that, if your current MATLAB folder contains files/folders that start by any character of ASCII code point smaller than that of the full stop (46, in decimal), then . and .. willl not correspond to the first two elements of that struct array.
Here is an illustrative example: if your current MATLAB folder has the following structure (!hello and 'world being either files or folders),
.
├── !hello
└── 'world
then you get this
>> f = dir;
>> for k = 1 : length(f), disp(f(k).name), end
!hello
'world
.
..
Why are . and .. not the first two entries, here? Because both the exclamation point and the single quote have smaller code points (33 and 39, in decimal, resp.) than that of the full stop (46, in decimal).
I refer you to this ASCII table for an exhaustive list of the visible characters that have an ASCII code point smaller than that of the full stop; note that not all of them are necessarily legal filename characters, though.
A custom dir function that does not list . and ..
Right after invoking dir, you can always get rid of the two offending entries from the struct array before manipulating it. Moreover, for convenience, if you want to save yourself some mental overhead, you can always write a custom dir function that does what you want:
function listing = dir2(varargin)
if nargin == 0
name = '.';
elseif nargin == 1
name = varargin{1};
else
error('Too many input arguments.')
end
listing = dir(name);
inds = [];
n = 0;
k = 1;
while n < 2 && k <= length(listing)
if any(strcmp(listing(k).name, {'.', '..'}))
inds(end + 1) = k;
n = n + 1;
end
k = k + 1;
end
listing(inds) = [];
Test
Assuming the same directory structure as before, you get the following:
>> f = dir2;
>> for k = 1 : length(f), disp(f(k).name), end
!hello
'world
a similar solution from the one suggested by Tal is:
listing = dir(directoryname);
listing(1:2)=[]; % here you erase these . and .. items from listing
It has the advantage to use a very common trick in Matlab, but assumes that you know that the first two items of listing are . and .. (which you do in this case). Whereas the solution provided by Tal (which I did not try though) seems to find the . and .. items even if they are not placed at the first two positions within listing.
Hope that helps ;)
If you're just using dir to get a list of files and and directories, you can use Matlab's ls function instead. On UNIX systems, this just returns the output of the shell's ls command, which may be faster than calling dir. The . and .. directories won't be displayed (unless your shell is set up to do so). Also, note that the behavior of this function is different between UNIX and Windows systems.
If you still want to use dir, and you test each file name explicitly, as in your example, it's a good idea to use strcmp (or one of its relations) instead of == to compare strings. The following would skip all hidden files and folder on UNIX systems:
listing = dir;
for i = 1:length(listing)
if ~strcmp(listing(i).name(1),'.')
% Do something
...
end
end
You may also wanna exclude any other files besides removing dots
d = dir('/path/to/parent/folder')
d(1:2)=[]; % removing dots
d = d([d.isdir]) % [d.isdir] returns a logical array of 1s representing folders and 0s for other entries
I used: a = dir(folderPath);
Then used two short code that return struct:
my_isdir = a([a.isdir]) Get a struct which only has folder info
my_notdir = a(~[a.isdir]) Get a struct which only has non-folder info
Combining #jubobs and #Tal solutions:
function d = dir2(folderPath)
% DIR2 lists the files in folderPath ignoring the '.' and '..' paths.
if nargin<1; folderPath = '.'; elseif nargin == 1
d = dir(folderPath);
d = d(~ismember({d.name},{'.','..'}));
end
None of the above puts together the elements as I see the question having being asked - obtain a list only of directories, while excluding the parents.
Just combining the elements, I would go with:
function d = dirsonly(folderPath)
% dirsonly lists the unhidden directories in folderPath ignoring '.' and '..'
% creating a simple cell array without the rest of the dir struct information
if nargin<1; folderPath = '.'; elseif nargin == 1
d = dir(folderPath);
d = {d([d.isdir] & [~ismember({d.name},{'.','..'})]).name}.';
end
if hidden folders in general aren't wanted the ismember line could be replaced with:
d = {d([d.isdir] & [~strncmp({d.name},'.',1)]).name}.';
if there were very very large numbers of interfering non-directory files it might be more efficient to separate the steps:
d = d([d.isdir]);
d = {d([~strncmp({d.name},'.',1)]).name}.';
We can use the function startsWith
folders = dir("folderPath");
folders = string({folders.name});
folders = folders(~startsWith(folders,"."))
Potential Solution - just remove the fields
Files = dir;
FilesNew = Files(3:end);
You can just remove them as they are the first two "files" in the structure
Or if you are actually looking for specific file types:
Files = dir('*.mat');

MATLAB Reading several images from folder

I have a folder called BasePics within a folder called Images. Inside BasePics there are 30 JPEG images. I'm wondering if the following is possible: Can a script be written that reads all of these images using the imread() command. The names of the images are somewhat sequential: C1A_Base.jpg, C1B_Base.jpg, C1C_Base.jpg, C2A_Base.jpg, C2B_Base.jpg, C2C_Base.jpg, etc.... all the way up to C10C_Base.jpg
Can a loop be used somehow:
file = dir('Images\BasePics');
NF = length(file);
for k = 1:NF
images(k) = imread(fullfile('ImagesBasePics',file(k))
imagesc(images(k))
end
This is a rough idea of what I want to do, but I'm wondering if it can be done with the current naming format I have in the Images folder. I would also like to have each image being read be its own variable with the same or similar name as it is named in the folder Images\BasePics currently, rather than have an concatenated array of 30 images all under the one variable images. I would like to have 30 separate variables, with names such as A1, A2,A3,B1,B2,B3 etc...
Also when I just ask for:
dir images\BasePics
Matlab outputs 33 files, instead of 30. There are two extra files at the beginning of the folder: '.' and '..' and one at the end: 'Thumbs.db' These do not exist when I look at the folder separately, is there a way to programically have Matlab skip over these?
Thanks!!
Since you know the names of the files in advance, you can skip the dir and go ahead and read the files:
for l = 'ABC'
for n=1:10
nm = sprintf('C%d%c_Base.jpg', n, l );
fnm = sprintf('%c%d', l, n );
imgs.(fnm) = imread( fullfile('images','BasePics', nm ) );
end
end
Now you have a struct imgs with fields A1...C10 for each image.
You are very close. I would just use dir('Images\BasePics\*.jpg') to get rid of the extraneous files.
The naming system you want will not lend itself to additional batch processing (do you really want to type all of A1, A2, etc?). I would either keep it sequential, and store a list of the filenames to match, or use a struct array, like images.C1A, etc.
dirlist = dir('Images\BasePics\*.jpg');
for k = 1:length(dirlist);
fname = dirlist(k).name;
[path,name,ext] = fileparts(fname); % separate out base name of file
images.(name) = imread(fullfile('Images\BasePics', fname));
end

Load Multiple .mat Files to Matlab workspace

I'm trying to load several .mat files to the workspace. However, they seem to overwrite each other. Instead, I want them to append. I am aware that I can do something like:
S=load(file1)
R=load(file2)
etc.
and then append the variables manually.
But there's a ton of variables, and making an append statement for each one is extremely undesirable (though possible as a last resort). Is there some way for me to load .mat files to the workspace (by using the load() command without assignment) and have them append?
Its not entirely clear what you mean by "append" but here's a way to get the data loaded into a format that should be easy to deal with:
file_list = {'file1';'file2';...};
for file = file_list'
loaded.(char(file)) = load(file);
end
This makes use of dynamic field references to load the contents of each file in the list into its own field of the loaded structure. You can iterate over the fields and manipulate the data however you'd like from here.
It sounds like you have a situation in which each file contains a matrix variable A and you want to load into memory the concatenation of all these matrices along some dimension. I had a similar need, and wrote the following function to handle it.
function var = loadCat( dim, files, varname )
%LOADCAT Concatenate variables of same name appearing in multiple MAT files
%
% where dim is dimension to concatenate along,
% files is a cell array of file names, and
% varname is a string containing the name of the desired variable
if( isempty( files ) )
var = [];
return;
end
var = load( files{1}, varname );
var = var.(varname);
for f = 2:numel(files),
newvar = load( files{f}, varname );
if( isfield( newvar, varname ) )
var = cat( dim, var, newvar.(varname) );
else
warning( 'loadCat:missingvar', [ 'File ' files{f} ' does not contain variable ' varname ] );
end
end
end
Clark's answer and function actually solved my situation perfectly... I just added the following bit of code to make it a little less tedious. Just add this to the beginning and get rid of the "files" argument:
[files,pathname] = uigetfile('*.mat', 'Select MAT files (use CTRL/COMM or SHIFT)', ...
'MultiSelect', 'on');
Alternatively, it could be even more efficient to just start with this bit:
[pathname] = uigetdir('C:\');
files = dir( fullfile(pathname,'*.mat') ); %# list all *.mat files
files = {files.name}'; %# file names
data = cell(numel(files),1); %# store file contents
for i=1:numel(files)
fname = fullfile(pathname,files{i}); %# full path to file
data{i} = load(fname); %# load file
end
(modified from process a list of files with a specific extension name in matlab).
Thanks,
Jason