How to force Matlab to read files in a folder serially? - matlab

I have files in a folder that are numbered from writer_1 to writer_20. I wrote a code to read all the files and store them in cells. But the problem is that the files are not read serially.
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = []; %Remove these two from list
training = [];
for i = 1:length(folders)
current_folder = [Path_training folders(i).name '\'];
.
.
.
.
.
Here folders(1).name is writer_1 and folders(2).name is writer_10
I know that dir will return its results as the explorer does but is there any way to force it to go numerically?
I'm training an SVM based on these numbers and this problem is making it difficult.

I don't know of any direct solutions to the problem you are having.
I found a solution for a problem similar to yours, here
List = dir('*.png');
Name = {List.name};
S = sprintf('%s,', Name{:}); % '10.png,100.png,1000.png,20.png, ...'
D = sscanf(S, '%d.png,'); % [10; 100, 1000; 20; ...]
[sortedD, sortIndex] = sort(D); % Sort numerically
sortedName = Name(sortIndex); % Apply sorting index to original names
Differences are:
You are dealing with directories instead of files
Your directories have other text in their names in addition to the numbers

Approach #1
%// From your code
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = []
%// Convert folders struct to a cell array with all of the data from dir
folders_cellarr = struct2cell(folders)
%// Get filenames
fn = folders_cellarr(1,:)
%// Get numeral part and sorted indices
num=str2double(cellfun(#(x) strjoin(regexp(x,['\d'],'match'),''), fn(:),'uni',0))
[~,ind] = sort(num)
%// Re-arrange folders based on the sorted indices
folders = folders(ind)
Approach #2
If you would like to avoid struct2cell, here's an alternative approach -
%// Get filenames
fn = cellstr(ls(Path_training))
fn(ismember(fn,{'.', '..'}))=[]
%// Get numeral part and sorted indices
num=str2double(cellfun(#(x) strjoin(regexp(x,['\d'],'match'),''), fn(:),'uni',0))
[~,ind] = sort(num)
%// List directory and re-arrange the elements based on the sorted indices
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = []
folders = folders(ind)
Please note that strjoin is a recent addition to MATLAB Toolbox. So, if you are on an older version of MATLAB, here's the source code link from MATLAB File-exchange.

Stealing a bit from DavidS and with the assumption that your folders all are of the form "writer_XX" with XX being digits.
folders = dir([pwd '\temp']);
folders(ismember( {folders.name}, {'.', '..'}) ) = [];
% extract numbers from cell array
foldersNumCell = regexp({folders.name}, '\d*', 'match');
% convert from cell array of strings to double
foldersNumber = str2double(foldersNumCell);
% get sort order
[garbage,sortI] = sort(foldersNumber);
% rearrange the structure
folders = folders(sortI);
The advantage of this is that it avoids a for loop. In reality it only makes a difference though if you have tens of thousands for folders. (I created 50,000 folders labeled 'writer_1' to 'writer_50000'. The difference in execution time was about 1.2 seconds.

Here is a slightly different way (edited to fix bug and implement suggestion of #Divakar to eliminate for loop)
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = [];
%// Get folder numbers as cell array of strings
folder_nums_cell = regexp({folders.name}, '\d*', 'match');
%// Convert cell array to vector of numbers
folder_nums = str2double(vertcat(folder_nums_cell{:}));
%// Sort original folder array
[~,inds] = sort(folder_nums);
folders = folders(inds);

Related

Only Import File when it contains certain numbers from a Table

I got a couple 100 sensor measurement files all containing the date and time of measurement. All the files have names that include date and time. Example:
07-06-2016_17-58-32.wf
07-06-2016_18-02-32.wf
...
...
08-06-2016_17:48-26.wf
I have a function (importfile) and a loop that imports my data. The loop looks like this:
Files = dir('C:\Osci\User\*.waveform');
numFiles = length(Files);
Data = cell(1, numFiles);
for fileNum = 1:numFiles
Data{fileNum} = importfile(Files(fileNum).name);
end
Not all of these waveform files are useful. The measurement files are only useful if they were generated in a certain time period. I got a table that shows my allowed time periods:
07-Jun-2016 18:00:01
07-Jun-2016 18:01:31
07-Jun-2016 18:02:01
...
I want to modify my loop, so that the files (.waveform files) are only imported if the numbers for day (first number), hour (4th number) and minute (5th number) from the files match the numbers of the table containing the allowed time periods.
EDIT: Rather than a scalar hour, minute, and second, there is a vector of each. In my case, MyDay, MyHour and MyMinute are 1100x1 matrices while fileTimes only consists of 361 rows.
So, using the provided example the loop should only import file
07-06-2016_18-02-32.wf
since it is the only one where the numbers match (in this case 7, 18, 02).
EDIT2: Using #erfan's answer (and changing some directories and variable names) I have the following working code:
fmtstr = 'O:\\Basic_Research_All\\Lange\\Skripe ISAT\\Rohdaten\\*_%02i-*-*_%02i-%02i-*.wf';
Files = struct([]);
n = size(MyDayMyHourMyMinute);
for N = 1:n;
Files = [Files; dir(sprintf(fmtstr, MyDayMyHourMyMinute(N,:)))];
end
numFiles = length(Files);
WaveformData = cell(1, numFiles);
for fileNum = 1:numFiles
WaveformData{fileNum} = importfile(Files(fileNum).name);
end
Since your filenames are pretty well defined as dates and times, you can prefilter your list by turning them into actual dates and times:
% Get the file list
Files = dir('C:\Osci\User\*.waveform');
% You only need the names
Files = {Files.name};
% Get just the filename w/o the extension
[~, baseFileNames] = cellfun(#(x) fileparts(x), Files, 'UniformOutput', false);
% Your filename is just a date, so parse it as such
fileTimes = datevec(baseFileNames, 'mm-dd-yyyy_HH-MM-SS');
% Now pick out the files you want
% goodFiles = fileTimes(:, 4) == myHour & fileTimes(:, 5) == myMinute & fileTimes(:, 6) == mySecond;
goodFiles = ismember(fileTimes(:, 4:6), [myHour(:), myMinute(:), mySecond(:)], 'rows');
% Pare down your list of filenames
Files = Files(goodFiles);
% Preallocate your data cell
Data = cell(1, numel(Files));
% Now do your loop
for idx = 1:numel(Data)
Data{idx} = importfile(Files{idx});
end
You will, of course, need to define myHour, myMinute and mySecond. Of course, using the logical indexing in goodFiles, you could impose any sort of time criteria, like time or date range. If you find that your filenames aren't so well defined, you could parse out the filename using textscan or strfind to get the bits you want. The important thing is that cell arrays can be indexed into in much the same way as numerical or string arrays and it's often better to vectorize your filter criteria and then only do the loop on the parts you have to.
The OP indicated in a comment below that rather than a scalar hour, minute, and second, there is a vector of each. In that case, use ismember to match the two time vectors and return a logical index vector. With 2015a, MathWorks introduced the function ismembertol, which allows one to check membership within a certain tolerance.
You can apply your selection from the beginning. Imagine the acceptable values for day, hour and minute are saved in acc as an n*3 matrix. If you replace the first line of your code with:
fmtstr = 'C:\Osci\User\%02i-*-*_%02i-%02i-*.wf';
Files = struct([]);
for ii = 1:n
Files = [Files; dir(sprintf(fmtstr, acc(ii,:)))];
end
Then you have already applied your criteria to Files. The rest is the same.

Matlab: Import multiple numeric csv .txt files into single cell array

I have multiple (say N of them) .txt files consisting of numeric csv data in matrix form. I would like to import each of these data files into one (1 x N) cell array, whilst preserving the original matrix form. If the original data is small, say 3x3, then textscan does the job in the following manner:
fileId = fopen('data1.txt');
A{1} = textscan(fileID, '%d %d %d', 'delimiter',',','CollectOutput',1);
(This will be part of a function.) But what if my .txt files have 100 columns of data? I could write '%d' 100 times in the formatSpec, but there must be a better way?
This seems to be an easy problem, but I'm quite new to Matlab and am at a loss as to how to proceed. Any advice would be much appreciated, thanks!!
For such cases with consistent data in each of those text files, you can use importdata without worrying about format specifiers. Two approaches are discussed here based on it.
Approach 1
filenames = {'data1.txt' 'data2.txt' 'data3.txt'}; %// cell array of filenames
A = cell(1,numel(filenames)); %// Pre-allocation
for k = 1:numel(filenames)
imported_data = importdata(char(filenames(k)));
formatted_data = cellfun(#str2num, imported_data, 'uni', 0);
A{k} = vertcat(formatted_data{:})
end
Approach 2
Assuming those text files are the only .txt files in the current working directory, you can directly get the filenames and use them to store data from them into a cell array, like this -
files = dir('*.txt');
A = cell(1,numel(files)); %// Pre-allocation
for k = 1:numel(files)
imported_data = importdata(files(k).name);
formatted_data = cellfun(#str2num, imported_data, 'uni', 0)
A{k} = vertcat(formatted_data{:})
end

Find duplicates in matlab path

Duplicates in the matlab path are a hassle, because you can not control which one gets executed. A first step to handle duplicates is to find them. How can I find duplicate .m files in my matlab path ?
Well, in itself it's not a herculean task. We juste have to list all .m files in the path and find multiple occurrences of the same file. We can use a mix of the path, what, and unique functions.
function find_duplicate()
P=path;
P=strsplit(P, pathsep());
% mydir='/home/myusername/matlabdir';
% P=P(strncmpi(mydir,P,length(mydir)));
P=cellfun(#(x) what(x),P,'UniformOutput',false);
P=vertcat(P{:});
Q=arrayfun(#(x) x.m,P,'UniformOutput',false); % Q is a cell of cells of strings
Q=vertcat(Q{:});
R=arrayfun(#(x) repmat({x.path},size(x.m)),P,'UniformOutput',false); % R is a cell of cell of strings
R=vertcat(R{:});
[C,ia,ic]=unique(Q);
for c=1:numel(C)
ind=strcmpi(C{c},Q);
if sum(ind)>1
fprintf('duplicate %s at paths\n\t',C{c});
fprintf('%s\n\t',R{ind});
fprintf('\n');
end
end
end
Rather than handling the complete Matlab path, one can restrict the search for duplicates to one's own folder. To do that, just uncomment the third line and replace the directory name by one of your choice.
To analyze a given folder (recursively), you can proceed as follows.
folder = 'C:\Users\Luis\Desktop'; %// folder to be analyzed
[ success files id ] = fileattrib(['.' filesep '*']); %// this is recursive
[fullNames{1:numel(files)}] = deal(files.Name);
isMFile = cellfun(#(s) all(s(end-1:end)=='.m'), fullNames);
fullNames = fullNames(isMFile); %// keep only m-files
F = numel(fullNames);
start = cellfun(#(s) find(s==filesep,1,'last'), fullNames);
names = arrayfun(#(k) fullNames{k}(start(k)+1:end), 1:F, 'uni', 0); %// file name
[ii jj] = ndgrid(1:F); %// generate all pairs
equal = arrayfun(#(n) strcmp(names{ii(n)},names{jj(n)}), 1:F^2); %// test each
%// pair of files
equal = reshape(equal,F,F) - eye(F); %// equality with oneself doesn't count
isDuplicate = any(equal); %// it is a duplicate if it has some equal file
duplicates = fullNames(isDuplicate); %// cell array with full names of duplicates
To test the whole path, use the above code in a loop over all folders in the path. You can do it along the following lines (I haven't tested it, as I don't have the strsplit function):
p = path;
p = strsplit(p,';');
duplicates = {};
for kk = numel(p)
folder = p{kk};
[ success files id ] = fileattrib(['.' filesep '*']);
[fullNames{1:numel(files)}] = deal(files.Name);
isMFile = cellfun(#(s) all(s(end-1:end)=='.m'), fullNames);
fullNames = fullNames(isMFile);
F = numel(fullNames);
start = cellfun(#(s) find(s==filesep,1,'last'), fullNames);
names = arrayfun(#(k) fullNames{k}(start(k)+1:end), 1:F, 'uni', 0);
[ii jj] = ndgrid(1:F);
equal = arrayfun(#(n) strcmp(names{ii(n)},names{jj(n)}), 1:F^2);
equal = reshape(equal,F,F) - eye(F);
isDuplicate = any(equal);
duplicates = {duplicates, fullNames(isDuplicate)}; %// add previous ones
end

Subset folder contents Matlab

I have about 1500 images within a folder named 3410001ne => 3809962sw. I need to subset about 470 of these files to process with Matlab code. Below is the section of code prior to my for loop which lists all of the files in a folder:
workingdir = 'Z:\project\code\';
datadir = 'Z:\project\input\area1\';
outputdir = 'Z:\project\output\area1\';
cd(workingdir) %points matlab to directory containing code
files = dir(fullfile(datadir, '*.tif'))
fileIndex = find(~[files.isdir]);
for i = 1:length(fileIndex)
fileName = files(fileIndex(i)).name;
Files also have ordinal directions attached (e.g. 3410001ne, 3410001nw), however, not all directions are associated with each root. How can I subset the folder contents to include 470 of 1500 files ranging from 3609902sw => 3610032sw? Is there a command where you can point Matlab to a range of files in a folder, rather than the entire folder? Thanks in advance.
Consider the following:
%# generate all possible file names you want to include
ordinalDirections = {'n','s','e','w','ne','se','sw','nw'};
includeRange = 3609902:3610032;
s = cellfun(#(d) cellstr(num2str(includeRange(:),['%d' d])), ...
ordinalDirections, 'UniformOutput',false);
s = sort(vertcat(s{:}));
%# get image filenames from directory
files = dir(fullfile(datadir, '*.tif'));
files = {files.name};
%# keep only subset of the files matching the above
files = files(ismember(files,s));
%# process selected files
for i=1:numel(files)
fname = fullfile(datadir,files{i});
img = imread(fname);
end
Something like this maybe could work.
list = dir(datadir,'*.tif'); %get list of files
fileNames = {list.name}; % Make cell array with file names
%Make cell array with the range of wanted files.
wantedNames = arrayfun(#num2str,3609902:3610032,'uniformoutput',0);
%Loop through the list with filenames and compare to wantedNames.
for i=1:length(fileNames)
% Each row in idx will be as long as wantedNames. The cells will be empty if
% fileNames{i} is unmatched and 1 if match.
idx(i,:) = regexp(fileNames{i},wantedNames);
end
idx = ~(cellfun('isempty',idx)); %look for unempty cells.
idx = logical(sum(,2)); %Sum each row

How to efficiently find correlation and discard points outside 3-sigma range in MATLAB?

I have a data file m.txt that looks something like this (with a lot more points):
286.842995
3.444398
3.707202
338.227797
3.597597
283.740414
3.514729
3.512116
3.744235
3.365461
3.384880
Some of the values (like 338.227797) are very different from the values I generally expect (smaller numbers).
So, I am thinking that
I will remove all the points that lie outside the 3-sigma range. How can I do that in MATLAB?
Also, the bigger problem is that this file has a separate file t.txt associated with it which stores the corresponding time values for these numbers. So, I'll have to remove the corresponding time values from the t.txt file also.
I am still learning MATLAB, and I know there would be some good way of doing this (better than storing indices of the elements that were removed from m.txt and then removing those elements from the t.txt file)
#Amro is close, but the FIND is unnecessary (look up logical subscripting) and you need to include the mean for a true +/-3 sigma range. I would go with the following:
%# load files
m = load('m.txt');
t = load('t.txt');
%# find values within range
z = 3;
meanM = mean(m);
sigmaM = std(m);
I = abs(m - meanM) <= z * sigmaM;
%# keep values within range
m = m(I);
t = t(I);
%# load files
m = load('m.txt');
t = load('t.txt');
%# find outliers indices
z = 3;
idx = find( abs(m-mean(m)) > z*std(m) );
%# remove them from both data and time values
m(idx) = [];
t(idx) = [];