how skip some specific files in multiple subfolders in matlab? - matlab

I need to skip some specific files in different sub-folders. So I tried by line "if filename" in the snippet that if the filename not contains raw, info, do some operations, but it doesn't work. I really appreciate if someone can point me in the right direction how can I skip these filenames which does have specific character like "raw" or "info."..
input_dirName = dir('D:\Neda\Pytorch\CAMUS\training\');
Output_dirName = 'D:\Neda\Pytorch\CAMUS\data\';
GT_dirName = 'D:\Neda\Pytorch\CAMUS\GT\';
dirName = 'D:\Neda\Pytorch\CAMUS\training\';
fileList = SureScan_getAllFiles(dirName);
foldername = fullfile({input_dirName.folder}, {input_dirName.name});
foldername = foldername(3:end);
for k = 1:length(fileList)-50
filename = fileList{k};
if filename ~= contains(filename,'raw') | filename ~= contains(filename,'Info_') | filename ~= contains(filename,'sequence.mhd')| filename ~=contains(filename,'_sequence')
% do some operation
end
end

The output of contains is either true or false and hence it will never be equal to any filename.
To skip filenames that have any of 'raw', 'Info_', 'sequence.mhd' or '_sequence', use:
if ~contains(filename, {'raw', 'Info_', 'sequence.mhd', '_sequence'})
%do some operation
end

Related

Matlab - How can I find the lowest common directory of an arbitrary group of files?

I have a cell array of full file names and I want to find the lowest common directory where it makes sense to store accumulated data and what not.
Here is an example hierarchy of test data:
C:\Test\Run1\data1
C:\Test\Run1\data2
C:\Test\Run1\data3
C:\Test\Run2\data1
C:\Test\Run2\data2
.
.
.
In Matlab, the paths are stored in a cell array as follows (each run shares a row):
filePaths = {...
'C:\Test\Run1\data1','C:\Test\Run1\data2','C:\Test\Run1\data3'; ...
'C:\Test\Run2\data1','C:\Test\Run2\data2','C:\Test\Run2\data3'};
I want to write a routine that outputs the common path C:\Test\Run1 so that I can store relevant plots in a new directory there.
C:\Test\Run1\Accumulation_Plots
C:\Test\Run2\Accumulation_Plots
.
.
.
Previously, I was only concerned with two files in an x-by-2 cell, so the regiment below worked; however, strcmp lost it's appeal since I can't (AFAIK) index the whole cell at once.
d = 1;
while strcmp(filePaths{1}(1:d),filePaths{2}(1:d))
d = d + 1;
end
common_directory = filePaths{1}(1:d-1);
mkdir(common_directory,'Accumulation_Plots');
As suggested by #nekomatic, I'm posting my comment as an answer.
filePaths = {...
'C:\Test\Run1\data1','C:\Test\Run1\data2','C:\Test\Run1\data3'; ...
'C:\Test\Run2\data1','C:\Test\Run2\data2','C:\Test\Run2\data3'};
% Sort the file paths
temp = sort(filePaths(:));
% Take the first and the last one, and split by '\'
first = strsplit(temp{1}, '\');
last = strsplit(temp{end}, '\');
% Compare them up to the depth of the smallest. Find the 'first N matching values'
sizeMin = min(numel(first), numel(last));
N = find(~[cellfun(#strcmp, first(1:sizeMin), last(1:sizeMin)) 0], 1, 'first') - 1;
% Get the smallest common path
commonPath = strjoin(first(1:N), '\');
You just need to compare the first d characters of any path in the array - e.g. path 1 - with the first d characters of the other paths. The longest common base path can't be longer than path 1 and it can't be shorter than the shortest common base path between path 1 and any other path.
There must be several ways you could do that, but a concise one is using strfind to match the strings and cellfun with isempty to check which ones didn't match:
% filePaths should contain at least two paths
filePaths = {...
'C:\Test\Run1\data1','C:\Test\Run1\data2','C:\Test\Run1\data3'; ...
'C:\Test\Run2\data1','C:\Test\Run2\data2','C:\Test\Run2\data3'};
path1 = filePaths{1};
filePaths = filePaths(2:end);
% find longest common left-anchored substring
d = 1;
while ~any(cellfun(#isempty, strfind(filePaths, path1(1:d))))
d = d + 1;
end
% find common base path from substring
[common_directory, ~, ~] = fileparts(path1(1:d));
Your code leaves d containing the length of the longest common left-anchored substring between the paths, but that might be longer than the common base path; fileparts extracts the actual base path from that substring.

Matlab | How to load/use files with consecutive names (abc1, abc2, abc3) and then pass on to the next series (cba1, cba2, cba3)?

I have a folder containing a series of data with file names like this:
abc1
abc2
abc3
bca1
bca2
bca3
bca4
bca5
cba1
... etc
My goal is to load all the relevant files for each file name, so all the "abc" files, and plot them in one graph. Then move on to the next file name, and do the same, and so forth. Is there a way to do this?
This is what I currently have to load and run through all the files, grab the data in them and get their name (without the .mat extension) to be able to save the graph with the same filename.
dirName = 'C:\DataDirectory';
files = dir( fullfile(dirName,'*.mat') );
files = {files.name}';
data = cell(numel(files),1);
for i=1:numel(files)
fname = fullfile(dirName,files{i});
disp(fname);
files{i} = files{i}(1:length(files{i})-4);
disp(files{i});
[Rest of script]
end
You already found out about the cool features of dir, and have a cell array files, which contains all file names, e.g.
files =
'37abc1.mat'
'37abc2.mat'
'50bca1.mat'
'50bca2.mat'
'1cba1.mat'
'1cba2.mat'
The main task now is to find all prefixes, 37abc, 50bca, 1cba, ... which are present in files. This can be done using a regular expression (regexp). The Regexp Pattern can look like this:
'([\d]*[\D]*)[\d]*.mat'
i.e. take any number of numbers ([\d]*), then any number of non-numeric characters ([\D]*) and keep those (by putting that in brackets). Next, there will be any number of numeric characters ([\d]*), followed by the text .mat.
We call the regexp function with that pattern:
pre = regexp(files,'([\d]*[\D]*)[\d]*.mat','tokens');
resulting in a cell array (one cell for each entry in files), where each cell contains another cell array with the prefix of that file. To convert this to a simple not-nested cell array, we call
pre = [pre{:}];
pre = [pre{:}];
resulting in
pre =
'37abc' '37abc' '50bca' '50bca' '1cba' '1cba'
To remove duplicate entries, we use the unique function:
pre = unique(pre);
pre =
'37abc' '50bca' '1cba'
which leaves us with all prefixes, that are present. Now you can loop through each of these prefixes and apply your stuff. Everything put together is:
% Find all files
dirName = 'C:\DataDirectory';
files = dir( fullfile(dirName,'*.mat') );
files = {files.name}';
% Find unique prefixes
pre = regexp(files,'([\d]*[\D]*)[\d]*.mat','tokens');
pre = [pre{:}]; pre = [pre{:}];
pre = unique(pre);
% Loop through prefixes
for ii=1:numel(pre)
% Get files with this prefix
curFiles = dir(fullfile(dirName,[pre{ii},'*.mat']));
curFiles = {curFiles.name}';
% Loop through all files with this prefix
for jj=1:numel(curFiles)
% Here the magic happens
end
end
Sorry, I misunderstood your question, I found this solution:
file = dir('*.mat')
matching = regexp({file.name}, '^[a-zA-Z_]+[0-9]+\.mat$', 'match', 'once'); %// Match once on file name, must be a series of A-Z a-z chars followed by numbers.
matching = matching(~cellfun('isempty', matching));
str = unique(regexp(matching, '^[a-zA-Z_]*', 'match', 'once'));
str = str(~cellfun('isempty', str));
group = cell(size(str));
for is = 1:length(str)
ismatch = strncmp(str{is}, matching, length(str{is}));
group{is} = matching(ismatch);
end
Answer came from this source: Matlab Central

Loading multiple text files from a single directory in matlab

First time here so please be gentle
So the basic idea is i have folders with just txt files that has about 20000 points each. I only want specific intervals from each of them.
I have a made a single file with the ranges for that looks like this
. 2715 2955
1132 1372
each row representing the range i want in one file
I want to batch load all the files and export the just the ranges of each. Ive lost too much sleep over this please help
dirName = '*'; %# folder path
files = dir( fullfile(dirName,'*.txt') ); %# list all *.xyz files
files = {files.name}' ; %'# file names
data = cell(numel(files),1) ; %# store file contents
for u=1:numel(files)
A=files{u} ; %# full path to file
files{u};
STR1 = A
B=load(STR1);
end
This is all i have come up with in 2 days. im new to matlab
Thanks
A very good help is the matlab help of fscanf, http://www.mathworks.co.uk/help/matlab/ref/fscanf.html. Also, in your load you don't have the path. Replace the last two lines in your for loop with:
STR1 = [dirName A]
fileID = fopen(STR1,'r');
formatSpec = '%f';
B = fscanf(fileID,formatSpec)
Or try:
delim = ' ';
nrhdr = 0;
STR1 = [dirName A]
A = importdata(STR1, delim, nrhdr);
A.data will be your data, I'm assuming no header lines.

Separating an array based on whether it contains a phrase or not

I am really just a noob at Matlab, so please don't get upset if I use wrong syntax. I am currently writing a small program in which I put all .xlsx file names from a certain directory into an array. I now want to separate the files into two different arrays based on their name. This is what I tried:
files = dir('My_directory\*.xlsx')
file_number = 1;
file_amount = length(files);
while file_number <= file_amount;
file_name = files(file_number).name;
filescs = [];
filescwf = [];
if strcmp(file_name,'*cs.xlsx') == 1;
filescs = [filescs,file_name];
else
filescwf = [filescwf,file_name];
end
file_number = file_number + 1
end
The idea here is that strcmp(file_name,'*cs.xlsx') checks if file_name contains 'cs' at the end. If it does, it is put into filescs, if it doesn't it is put into filescwf. However, this does not seem to work...
Any thoughts?
strcmp(file_name,'*cs.xlsx') checks whether file_name is identical to *cs.xlsx. If there is no file by that name (hint: few file systems allow '*' as part of a file name), it will always be false. (btw: there is no need for the '==1' comparison or the semicolon on the respective line)
You can use array indexing here to extract the relevant part of the filename you want to compare. file_name(1:5), will give you the first 5 characters, file_name(end-5:end) will give you the last 6, for example.
strcmp doesn't work with wildcards such as *cs.xlsx. See this question for an alternative approach.
You can use regexp to check the final letters of each of your files, then cellfun to apply regexp to all your filenames.
Here, getIndex will have 1's for all the files ending with cs.xlsx. The (?=$) part make sure that cs.xlsx is at the end.
files = dir('*.xlsx')
filenames = {files.name}'; %get filenames
getIndex = cellfun(#isempty,regexp(filenames, 'cs.xlsx(?=$)'));
list1 = filenames(getIndex);
list2 = filenames(~getIndex);

How to get all files under a specific directory in MATLAB?

I need to get all those files under D:\dic and loop over them to further process individually.
Does MATLAB support this kind of operations?
It can be done in other scripts like PHP,Python...
Update: Given that this post is quite old, and I've modified this utility a lot for my own use during that time, I thought I should post a new version. My newest code can be found on The MathWorks File Exchange: dirPlus.m. You can also get the source from GitHub.
I made a number of improvements. It now gives you options to prepend the full path or return just the file name (incorporated from Doresoom and Oz Radiano) and apply a regular expression pattern to the file names (incorporated from Peter D). In addition, I added the ability to apply a validation function to each file, allowing you to select them based on criteria other than just their names (i.e. file size, content, creation date, etc.).
NOTE: In newer versions of MATLAB (R2016b and later), the dir function has recursive search capabilities! So you can do this to get a list of all *.m files in all subfolders of the current folder:
dirData = dir('**/*.m');
Old code: (for posterity)
Here's a function that searches recursively through all subdirectories of a given directory, collecting a list of all file names it finds:
function fileList = getAllFiles(dirName)
dirData = dir(dirName); %# Get the data for the current directory
dirIndex = [dirData.isdir]; %# Find the index for directories
fileList = {dirData(~dirIndex).name}'; %'# Get a list of the files
if ~isempty(fileList)
fileList = cellfun(#(x) fullfile(dirName,x),... %# Prepend path to files
fileList,'UniformOutput',false);
end
subDirs = {dirData(dirIndex).name}; %# Get a list of the subdirectories
validIndex = ~ismember(subDirs,{'.','..'}); %# Find index of subdirectories
%# that are not '.' or '..'
for iDir = find(validIndex) %# Loop over valid subdirectories
nextDir = fullfile(dirName,subDirs{iDir}); %# Get the subdirectory path
fileList = [fileList; getAllFiles(nextDir)]; %# Recursively call getAllFiles
end
end
After saving the above function somewhere on your MATLAB path, you can call it in the following way:
fileList = getAllFiles('D:\dic');
You're looking for dir to return the directory contents.
To loop over the results, you can simply do the following:
dirlist = dir('.');
for i = 1:length(dirlist)
dirlist(i)
end
This should give you output in the following format, e.g.:
name: 'my_file'
date: '01-Jan-2010 12:00:00'
bytes: 56
isdir: 0
datenum: []
I used the code mentioned in this great answer and expanded it to support 2 additional parameters which I needed in my case. The parameters are file extensions to filter on and a flag indicating whether to concatenate the full path to the name of the file or not.
I hope it is clear enough and someone will finds it beneficial.
function fileList = getAllFiles(dirName, fileExtension, appendFullPath)
dirData = dir([dirName '/' fileExtension]); %# Get the data for the current directory
dirWithSubFolders = dir(dirName);
dirIndex = [dirWithSubFolders.isdir]; %# Find the index for directories
fileList = {dirData.name}'; %'# Get a list of the files
if ~isempty(fileList)
if appendFullPath
fileList = cellfun(#(x) fullfile(dirName,x),... %# Prepend path to files
fileList,'UniformOutput',false);
end
end
subDirs = {dirWithSubFolders(dirIndex).name}; %# Get a list of the subdirectories
validIndex = ~ismember(subDirs,{'.','..'}); %# Find index of subdirectories
%# that are not '.' or '..'
for iDir = find(validIndex) %# Loop over valid subdirectories
nextDir = fullfile(dirName,subDirs{iDir}); %# Get the subdirectory path
fileList = [fileList; getAllFiles(nextDir, fileExtension, appendFullPath)]; %# Recursively call getAllFiles
end
end
Example for running the code:
fileList = getAllFiles(dirName, '*.xml', 0); %#0 is false obviously
You can use regexp or strcmp to eliminate . and ..
Or you could use the isdir field if you only want files in the directory, not folders.
list=dir(pwd); %get info of files/folders in current directory
isfile=~[list.isdir]; %determine index of files vs folders
filenames={list(isfile).name}; %create cell array of file names
or combine the last two lines:
filenames={list(~[list.isdir]).name};
For a list of folders in the directory excluding . and ..
dirnames={list([list.isdir]).name};
dirnames=dirnames(~(strcmp('.',dirnames)|strcmp('..',dirnames)));
From this point, you should be able to throw the code in a nested for loop, and continue searching each subfolder until your dirnames returns an empty cell for each subdirectory.
This answer does not directly answer the question but may be a good solution outside of the box.
I upvoted gnovice's solution, but want to offer another solution: Use the system dependent command of your operating system:
tic
asdfList = getAllFiles('../TIMIT_FULL/train');
toc
% Elapsed time is 19.066170 seconds.
tic
[status,cmdout] = system('find ../TIMIT_FULL/train/ -iname "*.wav"');
C = strsplit(strtrim(cmdout));
toc
% Elapsed time is 0.603163 seconds.
Positive:
Very fast (in my case for a database of 18000 files on linux).
You can use well tested solutions.
You do not need to learn or reinvent a new syntax to select i.e. *.wav files.
Negative:
You are not system independent.
You rely on a single string which may be hard to parse.
I don't know a single-function method for this, but you can use genpath to recurse a list of subdirectories only. This list is returned as a semicolon-delimited string of directories, so you'll have to separate it using strread, i.e.
dirlist = strread(genpath('/path/of/directory'),'%s','delimiter',';')
If you don't want to include the given directory, remove the first entry of dirlist, i.e. dirlist(1)=[]; since it is always the first entry.
Then get the list of files in each directory with a looped dir.
filenamelist=[];
for d=1:length(dirlist)
% keep only filenames
filelist=dir(dirlist{d});
filelist={filelist.name};
% remove '.' and '..' entries
filelist([strmatch('.',filelist,'exact');strmatch('..',filelist,'exact'))=[];
% or to ignore all hidden files, use filelist(strmatch('.',filelist))=[];
% prepend directory name to each filename entry, separated by filesep*
for f=1:length(filelist)
filelist{f}=[dirlist{d} filesep filelist{f}];
end
filenamelist=[filenamelist filelist];
end
filesep returns the directory separator for the platform on which MATLAB is running.
This gives you a list of filenames with full paths in the cell array filenamelist. Not the neatest solution, I know.
This is a handy function for getting filenames, with the specified format (usually .mat) in a root folder!
function filenames = getFilenames(rootDir, format)
% Get filenames with specified `format` in given `foler`
%
% Parameters
% ----------
% - rootDir: char vector
% Target folder
% - format: char vector = 'mat'
% File foramt
% default values
if ~exist('format', 'var')
format = 'mat';
end
format = ['*.', format];
filenames = dir(fullfile(rootDir, format));
filenames = arrayfun(...
#(x) fullfile(x.folder, x.name), ...
filenames, ...
'UniformOutput', false ...
);
end
In your case, you can use the following snippet :)
filenames = getFilenames('D:/dic/**');
for i = 1:numel(filenames)
filename = filenames{i};
% do your job!
end
With little modification but almost similar approach to get the full file path of each sub folder
dataFolderPath = 'UCR_TS_Archive_2015/';
dirData = dir(dataFolderPath); %# Get the data for the current directory
dirIndex = [dirData.isdir]; %# Find the index for directories
fileList = {dirData(~dirIndex).name}'; %'# Get a list of the files
if ~isempty(fileList)
fileList = cellfun(#(x) fullfile(dataFolderPath,x),... %# Prepend path to files
fileList,'UniformOutput',false);
end
subDirs = {dirData(dirIndex).name}; %# Get a list of the subdirectories
validIndex = ~ismember(subDirs,{'.','..'}); %# Find index of subdirectories
%# that are not '.' or '..'
for iDir = find(validIndex) %# Loop over valid subdirectories
nextDir = fullfile(dataFolderPath,subDirs{iDir}); %# Get the subdirectory path
getAllFiles = dir(nextDir);
for k = 1:1:size(getAllFiles,1)
validFileIndex = ~ismember(getAllFiles(k,1).name,{'.','..'});
if(validFileIndex)
filePathComplete = fullfile(nextDir,getAllFiles(k,1).name);
fprintf('The Complete File Path: %s\n', filePathComplete);
end
end
end