Subset folder contents Matlab - matlab

I have about 1500 images within a folder named 3410001ne => 3809962sw. I need to subset about 470 of these files to process with Matlab code. Below is the section of code prior to my for loop which lists all of the files in a folder:
workingdir = 'Z:\project\code\';
datadir = 'Z:\project\input\area1\';
outputdir = 'Z:\project\output\area1\';
cd(workingdir) %points matlab to directory containing code
files = dir(fullfile(datadir, '*.tif'))
fileIndex = find(~[files.isdir]);
for i = 1:length(fileIndex)
fileName = files(fileIndex(i)).name;
Files also have ordinal directions attached (e.g. 3410001ne, 3410001nw), however, not all directions are associated with each root. How can I subset the folder contents to include 470 of 1500 files ranging from 3609902sw => 3610032sw? Is there a command where you can point Matlab to a range of files in a folder, rather than the entire folder? Thanks in advance.

Consider the following:
%# generate all possible file names you want to include
ordinalDirections = {'n','s','e','w','ne','se','sw','nw'};
includeRange = 3609902:3610032;
s = cellfun(#(d) cellstr(num2str(includeRange(:),['%d' d])), ...
ordinalDirections, 'UniformOutput',false);
s = sort(vertcat(s{:}));
%# get image filenames from directory
files = dir(fullfile(datadir, '*.tif'));
files = {files.name};
%# keep only subset of the files matching the above
files = files(ismember(files,s));
%# process selected files
for i=1:numel(files)
fname = fullfile(datadir,files{i});
img = imread(fname);
end

Something like this maybe could work.
list = dir(datadir,'*.tif'); %get list of files
fileNames = {list.name}; % Make cell array with file names
%Make cell array with the range of wanted files.
wantedNames = arrayfun(#num2str,3609902:3610032,'uniformoutput',0);
%Loop through the list with filenames and compare to wantedNames.
for i=1:length(fileNames)
% Each row in idx will be as long as wantedNames. The cells will be empty if
% fileNames{i} is unmatched and 1 if match.
idx(i,:) = regexp(fileNames{i},wantedNames);
end
idx = ~(cellfun('isempty',idx)); %look for unempty cells.
idx = logical(sum(,2)); %Sum each row

Related

read text files inside a zip file without unzipping in matlab

I would like to read text files inside a zip file without unzipping using Matlab
Read the data of CSV file inside Zip File without extracting the contents in Matlab
The suggested above is working and I get list of cells for file.
zipFilename = 'C:\ZippedData.zip';
zipJavaFile = java.io.File(zipFilename);
% Create a Java ZipFile
zipFile = org.apache.tools.zip.ZipFile(zipJavaFile);
% Extract the entries from the ZipFile.
entries = zipFile.getEntries;
cnt = 1;
% Get Zip File Paths
while entries.hasMoreElements
tempObj = entries.nextElement;
file{cnt,1} = tempObj.getName.toCharArray';
cnt = cnt+ 1;
end
% Extract File Name
ind = regexp(file,'textfile.*');
ind = find(~cellfun(#isempty,ind)); % Find Non Empty Cell Index
file = file(ind);
% Create Absolute Path so that Windows consider as Directory
file = cellfun(#(x) fullfile('.',x),file,'UniformOutput',false);
\file1 , .\file2 ,..., .\filen , but them how do I use that in fopen and say textscan? something like fileID = fopen([zipFilename filesep file{1}]); ?.

Recursively read images from subdirectories

I am stuck on something that is supposed to be so simple.
I have a folder, say main_folder with four sub folders, say sub1, sub2, sub3 and sub4 each containing over 100 images. Now am trying to read and store them in an array. I have looked all over the internet and some MATLAB docs:
here, here and even the official doc.
My code is like this:
folder = 'main_folder/**'; %path containing all the training images
dirImage = dir('main_folder/**/*.jpg');%rdir(fullfile(folder,'*.jpg')); %reading the contents of directory
numData = size(dirImage,1); %no. of samples
arrayImage = zeros(numData, 133183); % zeros matrix for storing the extracted features from images
for i=1:numData
ifile = dirImage(i).name;
% ifolder = dirImage(i).folder;
I=imread([folder, '/', ifile]); %%%% read the image %%%%%
I=imresize(I,[128 128]);
...
If I try the code in the above snippet, the images are not read.
But if I replace the first two lines with something like:
folder = 'main_folder/'; %path containing all the training images
dirImage = dir('main_folder/sub1/*.jpg'); %rdir(fullfile(folder,'*.jpg'));
then all images in sub1 are read. How can I fix this? Any help will be highly appreciated. I want to read all the images in the four sub folders at once.
I am using MATLAB R2015a.
I believe you will need to use genpath to get all sub-folders, and then loop through each of them, like:
dirs = genpath('main_folder/'); % all folders recursively
dirs = regexp(dirs, pathsep, 'split'); % split into cellstr
for i = 1:numel(dirs)
dirImage = dir([dirs{i} '/*.jpg']); % jpg in one sub-folder
for j = 1:numel(dirImage)
img = imread([dirs{i} '/' dirImage(j).name]);
% process img using your code
end
end

How to import and save in Matlab Multiple Text Files creating a Matrix for each files

I have a very large data set which is divided in folders, I have 100 folders with approximately 200 text files each. I have been trying the for loop first of all importing one and then in another command importing the rest. But I am not interested in a dataarray but rather conserving each file with its name as I have to then match the dates among all the files and each file does not have the same amount of columns.
Each text file has is like the one I have attached, where the data I need is from the row 23 until column 13.
The data names are saves as 010010.txt, 010030.txt, 010050.txt ......until 014957.txt , they are not sequential
Apart from this I have created a script for importing one file but I would like to know how to repeat the same script for the rest.
filename = 'C:*\010010.txt';
startRow = 22;
formatSpec = '%4f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', '', 'WhiteSpace', '', 'HeaderLines' ,startRow-1, 'ReturnOnError', false);
fclose(fileID);
Untitled (010010) = [dataArray{1:end-1}];
I would like to repeat the same import process but for the rest files. I would appreciate any suggestion
The text files have the following format:
I only need from row 23 and column 13 and each txt file has different number of rows as some have data from 1992 - 2014 and other have only 2000 - 2014. The first column is the year and column 2 to 13 are months.
I guess you know the basepath under which all your folders are. You can then use something like this:
% First find all folders
folders = cell(0); % empty cell to save folder names
nFolders = 0;
allFolders = ls(basePath); % find all files and folders
for k=1:size(allFolders,1)
curFolder = fullfile(basePath,strtrim(allFolders(k,:)));
if isdir(curFolder) % find out if it is a folder
if ~(allFolders(k,1) == '.') % ignore '.' and '..'
folders{nFolders+1,1} = curFolder; % Save folder path
nFolders = nFolders + 1;
end
end
end
% Then find all files inside these folders
files = cell(0); % empty cell array for file names
nFiles = 0;
for k=1:nFolders % go through all folders
allFiles = ls(folders{k,1});
for l=1:size(allFiles,1) % go through all found files/subfolders
curFile = fullfile(folders{k},strtrim(allFiles(l,:)));
if ~isdir(curFile) % only select files
files{nFiles+1,1} = curFile; % and save it to the cell
nFiles = nFiles + 1;
end
end
end
Now you can iterate through the files cell and read all files according to your script. I see you are interested in the file name. You can extract the file name by
[path,filename,extension] = fileparts(files{k,1});
To import text files, you can use dlmread, which I think is more intuitive than textscan (but has more limitations, of course). For that you don't have to open the file using fopen, you can directly supply the file name.
value = dlmread(fileName,' ',[23,13,23,13]);
The delimiter is now a white space and only the value at row=23 / col=13 is read. Note that the range starts at row/col=0, not 1 like normally in Matlab - so maybe you'll have to change it to [22,12,22,12].

How to force Matlab to read files in a folder serially?

I have files in a folder that are numbered from writer_1 to writer_20. I wrote a code to read all the files and store them in cells. But the problem is that the files are not read serially.
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = []; %Remove these two from list
training = [];
for i = 1:length(folders)
current_folder = [Path_training folders(i).name '\'];
.
.
.
.
.
Here folders(1).name is writer_1 and folders(2).name is writer_10
I know that dir will return its results as the explorer does but is there any way to force it to go numerically?
I'm training an SVM based on these numbers and this problem is making it difficult.
I don't know of any direct solutions to the problem you are having.
I found a solution for a problem similar to yours, here
List = dir('*.png');
Name = {List.name};
S = sprintf('%s,', Name{:}); % '10.png,100.png,1000.png,20.png, ...'
D = sscanf(S, '%d.png,'); % [10; 100, 1000; 20; ...]
[sortedD, sortIndex] = sort(D); % Sort numerically
sortedName = Name(sortIndex); % Apply sorting index to original names
Differences are:
You are dealing with directories instead of files
Your directories have other text in their names in addition to the numbers
Approach #1
%// From your code
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = []
%// Convert folders struct to a cell array with all of the data from dir
folders_cellarr = struct2cell(folders)
%// Get filenames
fn = folders_cellarr(1,:)
%// Get numeral part and sorted indices
num=str2double(cellfun(#(x) strjoin(regexp(x,['\d'],'match'),''), fn(:),'uni',0))
[~,ind] = sort(num)
%// Re-arrange folders based on the sorted indices
folders = folders(ind)
Approach #2
If you would like to avoid struct2cell, here's an alternative approach -
%// Get filenames
fn = cellstr(ls(Path_training))
fn(ismember(fn,{'.', '..'}))=[]
%// Get numeral part and sorted indices
num=str2double(cellfun(#(x) strjoin(regexp(x,['\d'],'match'),''), fn(:),'uni',0))
[~,ind] = sort(num)
%// List directory and re-arrange the elements based on the sorted indices
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = []
folders = folders(ind)
Please note that strjoin is a recent addition to MATLAB Toolbox. So, if you are on an older version of MATLAB, here's the source code link from MATLAB File-exchange.
Stealing a bit from DavidS and with the assumption that your folders all are of the form "writer_XX" with XX being digits.
folders = dir([pwd '\temp']);
folders(ismember( {folders.name}, {'.', '..'}) ) = [];
% extract numbers from cell array
foldersNumCell = regexp({folders.name}, '\d*', 'match');
% convert from cell array of strings to double
foldersNumber = str2double(foldersNumCell);
% get sort order
[garbage,sortI] = sort(foldersNumber);
% rearrange the structure
folders = folders(sortI);
The advantage of this is that it avoids a for loop. In reality it only makes a difference though if you have tens of thousands for folders. (I created 50,000 folders labeled 'writer_1' to 'writer_50000'. The difference in execution time was about 1.2 seconds.
Here is a slightly different way (edited to fix bug and implement suggestion of #Divakar to eliminate for loop)
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = [];
%// Get folder numbers as cell array of strings
folder_nums_cell = regexp({folders.name}, '\d*', 'match');
%// Convert cell array to vector of numbers
folder_nums = str2double(vertcat(folder_nums_cell{:}));
%// Sort original folder array
[~,inds] = sort(folder_nums);
folders = folders(inds);

Matlab: Renaming files in a folder sequentially

If there are the following files in the folder C:\test\:
file1.TIF, file2.TIF .... file100.TIF
Can MatLab automatically rename them to:
file_0001.TIF, file_0002.TIF, .... file_0100.TIF?
No-loop approach -
directory = 'C:\test\'; %//' Directory where TIFF images are present
filePattern = fullfile(directory, 'file*.tif'); %//' files pattern with absolute paths
old_filename = cellstr(ls(filePattern)) %// Get the filenames
file_ID = strrep(strrep(old_filename,'file',''),'.TIF','') %// Get numbers associated with each file
str_zeros = arrayfun(#(t) repmat('0',1,t), 5-cellfun(#numel,file_ID),'uni',0) %// Get zeros string to be pre-appended to each filename
new_filename = strcat('file_',str_zeros,file_ID,'.TIF') %// Generate new filenames
cellfun(#(m1,m2) movefile(m1,m2),fullfile(directory,old_filename),fullfile(directory,new_filename)) %// Finally rename files with the absolute paths
Edit 1:
For a case when you have the filenames as file27.TIF, file28.TIF, file29.TIF and so on and you would like to rename them as file0001.TIF, file0002.TIF, file0003.TIF and so on respectively, try this -
directory = 'C:\test\'; %//' Directory where TIFF images are present
filePattern = fullfile(directory, 'file*.tif'); %//' files pattern with absolute paths
old_filename = cellstr(ls(filePattern)) %// Get the filenames
file_ID = strrep(strrep(old_filename,'file',''),'.TIF','') %// Get numbers associated with each file
file_ID_doublearr = str2double(file_ID)
file_ID_doublearr = file_ID_doublearr - min(file_ID_doublearr)+1
file_ID = strtrim(cellstr(num2str(file_ID_doublearr)))
str_zeros = arrayfun(#(t) repmat('0',1,t), 4-cellfun(#numel,file_ID),'uni',0) %// Get zeros string to be pre-appended to each filename
new_filename = strcat('file',str_zeros,file_ID,'.TIF') %// Generate new filenames
cellfun(#(m1,m2) movefile(m1,m2),fullfile(directory,old_filename),fullfile(directory,new_filename)) %// Finally rename files with the absolute paths
A slightly more robust method:
dirlist = dir(fullfile(mypath,'*.TIF'));
fullnames = {dirlist.name}; % Get rid of one layer of cell array-ness
[~,fnames,~] = cellfun(#fileparts,fullnames,'UniformOutput',false); % Create cell array of the file names from the output of dir()
fnums = cellfun(#str2double,regexprep(fnames,'[^0-9]','')); % Delete any character that isn't a number, returns it as a vector of doubles
fnames = regexprep(fnames,'[0-9]',''); % Delete any character that is a number
for ii = 1:length(dirlist)
newname = sprintf('%s_%04d.TIF',fnames{ii},fnums(ii)); % Create new file name
oldfile = fullfile(mypath,dirlist(ii).name); % Generate full path to old file
newfile = fullfile(mypath,newname); % Generate full path to new file
movefile(oldfile, newfile); % Rename the files
end
Though this will accomodate filenames of any length, it does assume that there are no numbers in the filename other than the counter at the end. MATLAB likes to throw things into nested cell arrays, so I incorporated cellfun in a couple places to bring things into more manageable formats. It also allows us to vectorize some of the code.