I need to get all those files under D:\dic and loop over them to further process individually.
Does MATLAB support this kind of operations?
It can be done in other scripts like PHP,Python...
Update: Given that this post is quite old, and I've modified this utility a lot for my own use during that time, I thought I should post a new version. My newest code can be found on The MathWorks File Exchange: dirPlus.m. You can also get the source from GitHub.
I made a number of improvements. It now gives you options to prepend the full path or return just the file name (incorporated from Doresoom and Oz Radiano) and apply a regular expression pattern to the file names (incorporated from Peter D). In addition, I added the ability to apply a validation function to each file, allowing you to select them based on criteria other than just their names (i.e. file size, content, creation date, etc.).
NOTE: In newer versions of MATLAB (R2016b and later), the dir function has recursive search capabilities! So you can do this to get a list of all *.m files in all subfolders of the current folder:
dirData = dir('**/*.m');
Old code: (for posterity)
Here's a function that searches recursively through all subdirectories of a given directory, collecting a list of all file names it finds:
function fileList = getAllFiles(dirName)
dirData = dir(dirName); %# Get the data for the current directory
dirIndex = [dirData.isdir]; %# Find the index for directories
fileList = {dirData(~dirIndex).name}'; %'# Get a list of the files
if ~isempty(fileList)
fileList = cellfun(#(x) fullfile(dirName,x),... %# Prepend path to files
fileList,'UniformOutput',false);
end
subDirs = {dirData(dirIndex).name}; %# Get a list of the subdirectories
validIndex = ~ismember(subDirs,{'.','..'}); %# Find index of subdirectories
%# that are not '.' or '..'
for iDir = find(validIndex) %# Loop over valid subdirectories
nextDir = fullfile(dirName,subDirs{iDir}); %# Get the subdirectory path
fileList = [fileList; getAllFiles(nextDir)]; %# Recursively call getAllFiles
end
end
After saving the above function somewhere on your MATLAB path, you can call it in the following way:
fileList = getAllFiles('D:\dic');
You're looking for dir to return the directory contents.
To loop over the results, you can simply do the following:
dirlist = dir('.');
for i = 1:length(dirlist)
dirlist(i)
end
This should give you output in the following format, e.g.:
name: 'my_file'
date: '01-Jan-2010 12:00:00'
bytes: 56
isdir: 0
datenum: []
I used the code mentioned in this great answer and expanded it to support 2 additional parameters which I needed in my case. The parameters are file extensions to filter on and a flag indicating whether to concatenate the full path to the name of the file or not.
I hope it is clear enough and someone will finds it beneficial.
function fileList = getAllFiles(dirName, fileExtension, appendFullPath)
dirData = dir([dirName '/' fileExtension]); %# Get the data for the current directory
dirWithSubFolders = dir(dirName);
dirIndex = [dirWithSubFolders.isdir]; %# Find the index for directories
fileList = {dirData.name}'; %'# Get a list of the files
if ~isempty(fileList)
if appendFullPath
fileList = cellfun(#(x) fullfile(dirName,x),... %# Prepend path to files
fileList,'UniformOutput',false);
end
end
subDirs = {dirWithSubFolders(dirIndex).name}; %# Get a list of the subdirectories
validIndex = ~ismember(subDirs,{'.','..'}); %# Find index of subdirectories
%# that are not '.' or '..'
for iDir = find(validIndex) %# Loop over valid subdirectories
nextDir = fullfile(dirName,subDirs{iDir}); %# Get the subdirectory path
fileList = [fileList; getAllFiles(nextDir, fileExtension, appendFullPath)]; %# Recursively call getAllFiles
end
end
Example for running the code:
fileList = getAllFiles(dirName, '*.xml', 0); %#0 is false obviously
You can use regexp or strcmp to eliminate . and ..
Or you could use the isdir field if you only want files in the directory, not folders.
list=dir(pwd); %get info of files/folders in current directory
isfile=~[list.isdir]; %determine index of files vs folders
filenames={list(isfile).name}; %create cell array of file names
or combine the last two lines:
filenames={list(~[list.isdir]).name};
For a list of folders in the directory excluding . and ..
dirnames={list([list.isdir]).name};
dirnames=dirnames(~(strcmp('.',dirnames)|strcmp('..',dirnames)));
From this point, you should be able to throw the code in a nested for loop, and continue searching each subfolder until your dirnames returns an empty cell for each subdirectory.
This answer does not directly answer the question but may be a good solution outside of the box.
I upvoted gnovice's solution, but want to offer another solution: Use the system dependent command of your operating system:
tic
asdfList = getAllFiles('../TIMIT_FULL/train');
toc
% Elapsed time is 19.066170 seconds.
tic
[status,cmdout] = system('find ../TIMIT_FULL/train/ -iname "*.wav"');
C = strsplit(strtrim(cmdout));
toc
% Elapsed time is 0.603163 seconds.
Positive:
Very fast (in my case for a database of 18000 files on linux).
You can use well tested solutions.
You do not need to learn or reinvent a new syntax to select i.e. *.wav files.
Negative:
You are not system independent.
You rely on a single string which may be hard to parse.
I don't know a single-function method for this, but you can use genpath to recurse a list of subdirectories only. This list is returned as a semicolon-delimited string of directories, so you'll have to separate it using strread, i.e.
dirlist = strread(genpath('/path/of/directory'),'%s','delimiter',';')
If you don't want to include the given directory, remove the first entry of dirlist, i.e. dirlist(1)=[]; since it is always the first entry.
Then get the list of files in each directory with a looped dir.
filenamelist=[];
for d=1:length(dirlist)
% keep only filenames
filelist=dir(dirlist{d});
filelist={filelist.name};
% remove '.' and '..' entries
filelist([strmatch('.',filelist,'exact');strmatch('..',filelist,'exact'))=[];
% or to ignore all hidden files, use filelist(strmatch('.',filelist))=[];
% prepend directory name to each filename entry, separated by filesep*
for f=1:length(filelist)
filelist{f}=[dirlist{d} filesep filelist{f}];
end
filenamelist=[filenamelist filelist];
end
filesep returns the directory separator for the platform on which MATLAB is running.
This gives you a list of filenames with full paths in the cell array filenamelist. Not the neatest solution, I know.
This is a handy function for getting filenames, with the specified format (usually .mat) in a root folder!
function filenames = getFilenames(rootDir, format)
% Get filenames with specified `format` in given `foler`
%
% Parameters
% ----------
% - rootDir: char vector
% Target folder
% - format: char vector = 'mat'
% File foramt
% default values
if ~exist('format', 'var')
format = 'mat';
end
format = ['*.', format];
filenames = dir(fullfile(rootDir, format));
filenames = arrayfun(...
#(x) fullfile(x.folder, x.name), ...
filenames, ...
'UniformOutput', false ...
);
end
In your case, you can use the following snippet :)
filenames = getFilenames('D:/dic/**');
for i = 1:numel(filenames)
filename = filenames{i};
% do your job!
end
With little modification but almost similar approach to get the full file path of each sub folder
dataFolderPath = 'UCR_TS_Archive_2015/';
dirData = dir(dataFolderPath); %# Get the data for the current directory
dirIndex = [dirData.isdir]; %# Find the index for directories
fileList = {dirData(~dirIndex).name}'; %'# Get a list of the files
if ~isempty(fileList)
fileList = cellfun(#(x) fullfile(dataFolderPath,x),... %# Prepend path to files
fileList,'UniformOutput',false);
end
subDirs = {dirData(dirIndex).name}; %# Get a list of the subdirectories
validIndex = ~ismember(subDirs,{'.','..'}); %# Find index of subdirectories
%# that are not '.' or '..'
for iDir = find(validIndex) %# Loop over valid subdirectories
nextDir = fullfile(dataFolderPath,subDirs{iDir}); %# Get the subdirectory path
getAllFiles = dir(nextDir);
for k = 1:1:size(getAllFiles,1)
validFileIndex = ~ismember(getAllFiles(k,1).name,{'.','..'});
if(validFileIndex)
filePathComplete = fullfile(nextDir,getAllFiles(k,1).name);
fprintf('The Complete File Path: %s\n', filePathComplete);
end
end
end
Related
I have a set of files in a folder that I want to convert to a different type using MATLAB, ie. for each file in this folder do "x".
I have seen answers that suggest the use of the "dir" function to create a structure that contains all the files as elements (Loop through files in a folder in matlab). However, this function requires me to specify the file type (.csv, .txt, etc). The files I want to process are from a very old microscope and have the file suffixes .000, .001, .002, etc so I'm unable to use the solutions suggested elsewhere.
Does anyone know how I can get around this problem? This is my first time using MATLAB so any help would be greatly appreciated. All the best!
As suggested by beaker and Ander Biguri, you can do this by getting all files and then sorting.
Here is the related code with explanations inline:
baseDir = 'C:\mydir';
files = dir(baseDir);
files( [ files.isdir] ) = []; % This removes directories, especially '.' and '..'.
fileNames = cell(size(files)); % In this cell, we store the file names.
fileExt = cell(size(files)); % In this cell, we store the file extensions.
for k = 1 : numel(fileNames )
cf = files(k);
fileNames{k} = [ cf.folder filesep cf.name]; % Get file name
[~,~, fileExt{k}] = fileparts( cf.name ); % Get extension
end
[~,idx] = sort( fileExt ); % Get index to sort by extension
fileNames = fileNames( idx ); % Do the actual sorting
I have a folder origin_training with subfolders like FM, folder1, folder2, ... . I can get the list of image files in .png format into a cell called FM.
FM_dir = fullfile(origin_training, 'FM');
FM = struct2cell(dir(fullfile(FM_dir, '*.png')))';
My goal is to match the names in my folder with the images in my cd, and make a new folder FM with the images from cd. Image names in the two paths are identical. I can do that with:
% mother_folder = '...' my current directory
filenames = FM(:,1);
destination = fullfile(mother_folder, 'FM', FM(:,1));
cellfun(#copyfile,filenames, destination);
This is really slow. Takes minutes even for small amounts of images (~30).
My current directory has ~ 10000 images (the ones that correspond to FM, folder2, folder3, ...). What am I missing here?
An alternative would be to create a shell command and execute it. Assuming FM contains full paths for each file, or relative paths from the current directory, then:
FM_dir = fullfile(origin_training, 'FM');
destination = fullfile(mother_folder, 'FM');
curdir = cd(FM_dir);
FM = dir('*.png');
cmd = ['cp ', sprintf('%s ',FM.name), destination];
system(cmd);
cd(curdir);
If you're on Windows, replace 'cp' by 'copy'.
Note that here we're not creating a shell command per file to be copied (presumably what copyfile does), but a single shell command that copies all files at once. We're also not specifying the name for each copied file, we specify the names of the files to be copied and where to copy them to.
Note also that for this to work, mother_folder must be an absolute path.
I somehow put my code into a function and now it works at expected speed. I suspected that cd had to do with the low speed so I am only passing the full directories as character vectors.
The same approach works for my purpose or with a slight modification to just copy from A to B. As now, it works to match files in A to those on B and copy to B/folder_name
function [out] = my_copyfiles(origin_dir, folder_name, dest_dir, extension)
new_dir = fullfile(origin_dir, folder_name);
% Get a cell with filenames in the origin_dir/folder_name
% Mind transposing to have them in the rows
NEW = struct2cell(dir(fullfile(new_dir, extension)))';
dest_folder = fullfile(dest_dir, folder_name);
% disp(dest_folder)
if ~exist(dest_folder, 'dir')
mkdir(dest_folder)
end
%% CAUTION, This is the step
% Use names from NEW to generate fullnames in dest_dir that would match
filenames = fullfile(dest_dir, NEW(:,1));
destination = fullfile(dest_folder, NEW(:,1));
% if you wanted from origin, instead run
% filenames = fullfile(new_dir, NEW(:,1));
% make the copies
cellfun(#copyfile,filenames, destination);
%Save the call
out.originDir = new_dir;
out.copyFrom = dest_dir;
out.to = dest_folder;
out.filenames = filenames;
out.destination = destination;
end
Matlab script shown below, lists all folders inside current directory. How can I list all folders which match specific wildcard pattern such as *pattern* or Sample*?
files = dir(pwd)
% Get a logical vector that tells which is a directory.
dirFlags = [files.isdir]
% Extract only those that are directories.
subFolders = files(dirFlags)
% Print folder names to command window.
for k = 1 : length(subFolders)
fprintf('Sub folder #%d = %s\n', k, subFolders(k).name);
end
I have a directory at '../../My_Dir' relative to the Matlab working directory. This directory itself has several sub-directories in it. Each sub-directory then has several files in it.
I want to create two-dimensional array, or a matrix, of strings. Each row represents one of the sub-directories. The first column in the row is the full path to the sub-directory itself, and the other columns are the full paths to the files within that sub-directory.
Can anyone show me some code that will help me to implement this? Thank you!
You can first get all the sub-folders by
d = dir(pathFolder);
isub = [d(:).isdir];
subFolders = {d(isub).name}';
Note, you further need to remove . and .. from it:
subFolders(ismember(subFolders,{'.','..'})) = [];
And then get files in each of them by using (from this post):
function fileList = getAllFiles(dirName)
dirData = dir(dirName); %# Get the data for the current directory
dirIndex = [dirData.isdir]; %# Find the index for directories
fileList = {dirData(~dirIndex).name}'; %'# Get a list of the files
if ~isempty(fileList)
fileList = cellfun(#(x) fullfile(dirName,x),... %# Prepend path to files
fileList,'UniformOutput',false);
end
subDirs = {dirData(dirIndex).name}; %# Get a list of the subdirectories
validIndex = ~ismember(subDirs,{'.','..'}); %# Find index of subdirectories
%# that are not '.' or '..'
for iDir = find(validIndex) %# Loop over valid subdirectories
nextDir = fullfile(dirName,subDirs{iDir}); %# Get the subdirectory path
fileList = [fileList; getAllFiles(nextDir)]; %# Recursively call getAllFiles
end
end
To get the folders FULL PATH:
d = dir(baseDir);
d(~[d.isdir])= []; %Remove all non directories.
names = setdiff({d.name},{'.','..'});
filesFullPath = names;
for i=1:size(names,2)
filesFullPath{i} = fullfile(baseDir,names{1,i});
end
Matlab... "thanks" for this s...
As an example, let's say I have three subfolders in 'My_Dir' called 'A' (containing 'a1.txt' and 'a2.txt'), 'B' (containing 'b1.txt'), and 'C' (containing 'c1.txt', 'c2.txt', and 'c3.txt'). This will illustrate how to handle a case with different numbers of files in each subfolder...
For MATLAB versions R2016b and later, the dir function supports recursive searching, allowing us to collect a list of files like so:
dirData = dir('My_Dir\*\*.*'); % Get structure of folder contents
dirData = dirData(~[dirData.isdir]); % Omit folders (keep only files)
fileList = fullfile({dirData.folder}.', {dirData.name}.'); % Get full file paths
fileList =
6×1 cell array
'...\My_Dir\A\a1.txt'
'...\My_Dir\A\a2.txt'
'...\My_Dir\B\b1.txt'
'...\My_Dir\C\c1.txt'
'...\My_Dir\C\c2.txt'
'...\My_Dir\C\c3.txt'
As an alternative, in particular for earlier versions, this can be done using a utility I posted to the MathWorks File Exchange: dirPlus. It can be used as follows:
dirData = dirPlus('My_Dir', 'Struct', true, 'Depth', 1);
fileList = fullfile({dirData.folder}.', {dirData.name}.');
Now we can format fileList in the way you specified above. First we can use unique to get a list of unique subfolders and an index. That index can then be used with mat2cell and diff to break fileList up by subfolder into a second level of cell array encapsulation:
[dirList, index] = unique({dirData.folder}.');
outData = [dirList mat2cell(fileList, diff([index; numel(fileList)+1]))]
outData =
3×2 cell array
'...\My_Dir\A' {2×1 cell}
'...\My_Dir\B' {1×1 cell}
'...\My_Dir\C' {3×1 cell}
How can I process all the files with ".xyz" extension in a folder? The basic idea is that I want a list of file names and then a for loop to load each file.
As others have already mentioned, you should use the DIR function to list files in a directory.
If you are still looking, here is an example to show how to use the function:
dirName = 'C:\path\to\folder'; %# folder path
files = dir( fullfile(dirName,'*.xyz') ); %# list all *.xyz files
files = {files.name}'; %'# file names
data = cell(numel(files),1); %# store file contents
for i=1:numel(files)
fname = fullfile(dirName,files{i}); %# full path to file
data{i} = myLoadFunction(fname); %# load file
end
Of course, you would have to supply the function that actually reads and parses the XYZ files.
Use dir() to obtain a list of filenames. You can specify wildcards.
You could use
fileName=ls('*xyz').
fileName variable will have the list of all the filenames which you can use in the for loop
Here is my answer:
dirName = 'E:\My Matlab\5';
[sub,fls] = subdir(dirName);
D = [];
j = 1;
for i=1:length(sub),
files{i} = dir( fullfile(sub{i},'*.xyz') );
if length(files{i})==1
D(j) = i;
files_s{j} = sub{i};
j=j+1;
end
end
varaible files_s returns the desire paths that contain those specific data types!
The subdir function can be found at:
http://www.mathworks.com/matlabcentral/fileexchange/1492-subdir--new-