vectorizing a script with cellfun - matlab

I'm aiming to import data from various folder and text files into matlab.
clear all
main_folder = 'E:\data';
%Directory of data
TopFolder = dir(main_folder);
%exclude the first two cells as they are just pointers.
TopFolder = TopFolder(3:end);
TopFolder = struct2cell(TopFolder);
Name1 = TopFolder(1,:);
%obtain the name of each folder
dirListing = cellfun(#(x)dir(fullfile(main_folder,x,'*.txt')),Name1,'un',0);
Variables = cellfun(#(x)struct2cell(x),dirListing,'un',0);
FilesToRead = cellfun(#(x)x(1,:),Variables,'un',0);
%obtain the name of each text file in each folder
This provides the name for each text file in each folder within 'main_folder'. I am now trying to load the data without using a for loop (I realise that for loops are sometimes faster in doing this but I'm aiming for a compact script).
The method I would use with a for loop would be:
for k = 1:length(FilesToRead);
filename{k} = cellfun(#(x)fullfile(main_folder,Name{k},x),FilesToRead{k},'un',0);
fid{k} = cellfun(#(x)fopen(x),filename{k},'un',0);
C{k} = cellfun(#(x)textscan(x,'%f'),fid{k},'un',0);
end
Is there a method which would involve not using loops at all? something like cellfun within cellfun maybe?

folder = 'E:\data';
files = dir(fullfile(folder, '*.txt'));
full_names = strcat(folder, filesep, {files.name});
fids = cellfun(#(x) fopen(x, 'r'), full_names);
c = arrayfun(#(x) textscan(x, '%f'), fids); % load data here
res = arrayfun(#(x) fclose(x), fids);
assert(all(res == 0), 'error in closing files');
but if the data is in csv format it can be even easier:
folder = 'E:\data';
files = dir(fullfile(folder, '*.txt'));
full_names = strcat(folder, filesep, {files.name});
c = cellfun(#(x) csvread(x), full_names, 'UniformOutput', false);
now all the data is stored in c

Yes. This it going to be pretty scary since C depends on fid depends on filename. The basic idea will be:
deal(feval(#(filenames_fids){filenames_fids{1}, filenames_fids{2}, ...
<compute C>}, feval(#(filenames){filenames, <compute fid>}, ...
<compute filenames>)));
Let's start with computing the filenames:
arrayfun(#(x)cellfun(#(x)fullfile(main_folder,Name{k},x),FilesToRead{k},...
'un',0), 1:length(FilesToRead), 'uniformoutput', 0);
this will give us a K-by-1 cell array of filenames. Now we can use that to compute fids:
{filenames, arrayfun(#(k)cellfun(#(x)fopen(x),filenames{k},'un',0), ...
1:length(FilesToRead), 'uniformoutput', 0)};
We stick fids together with filenames in a K-by-2 cell array, ready to pass on to compute our final outputs:
{filenames_fids{1}, filenames_fids{2}, ...
arrayfun(#(k)cellfun(#(x)textscan(x,'%f'), ...
filenames_fid{2}{k},'un',0), 1:length(FilesToRead), 'uniformoutput', 0)}
Then we're putting that final cell array into deal, so that the results end up in three different variables.
[filenames fid C] = deal(feval(#(filenames_fids){filenames_fids{1}, ...
filenames_fids{2}, arrayfun(#(k)cellfun(#(x)textscan(x,'%f'), ...
filenames_fid{2}{k},'un',0), 1:length(FilesToRead), 'uniformoutput', 0)}, ...
feval(#(filenames){filenames, arrayfun(#(k)cellfun(#(x)fopen(x), ...
filenames{k},'un',0), 1:length(FilesToRead), 'uniformoutput', 0)}, ...
arrayfun(#(x)cellfun(#(x)fullfile(main_folder,Name{k},x),FilesToRead{k}, ...
'un',0), 1:length(FilesToRead), 'uniformoutput', 0))));
Errm... There's probably a nicer way to do this if you don't mind about keeping filenames and fid. Maybe using cellfun instead of arrayfun could also make it more concise, but I'm not very good with cellfuns, so this is what I came up with. I think the for loop version is more compact anyway! (also, I haven't actually tested this. It will probably need some debugging).

Related

Matlab save table with different file names in different files

I'm trying to read different files (txt) using a datastore and readtable in order to parse them and write them into a .mat file.
I'm using ds = datastore('*.txt').Files to get all the file names in a directory, then with a for loop I iterate through all the different files and I save them with different names.
However when I import the file in matlab they have all the same table name (dat).
Here's the code:
ds = datastore('*.txt');
fnames = ds.Files;
l = length(fnames);
for i = 1:l
dat = readtable(fnames{i}, 'Delimiter', '\t');
dat.Properties.VariableNames(1:2) = {'rpm', 'p_coll'};
dat = removevars(dat{i},20:width(dat));
save([fnames{i} '.mat'],'dat');
end
I tried using an array of dat but it didn't work. Any ideas?
As Sardar said there is no point in storing your data in L different variables. If you have to do so, there probably is a bigger problem in your program design. You'd better describe why you need that.
As an alternative, you can load those files in a single cell array:
L = 10;
for ii =1:L
dat = 2*ii;
fn = sprintf('dat%d.mat', ii);
save(fn, 'dat');
end
dats = cell(L, 1);
for ii=1:L
fn = sprintf('dat%d.mat', ii);
load(fn);
dats{ii} = dat;
end
Another option is to lad them in a struct:
dats = struct();
for ii=1:L
fn = sprintf('dat%d.mat', ii);
load(fn);
dats.(sprintf('dat%d', ii)) = dat;
end
I do not see any advantage in this method compared to cell array, but it's kind of funny.
Finally, if you really have a reason to store data in multiple variables, you can use eval:
for ii =1:L
dat = ii^2;
eval(sprintf('dat%d=dat;', ii));
fn = sprintf('dat%d.mat', ii);
save(fn, sprintf('dat%d', ii));
end
for ii=1:L
fn = sprintf('dat%d.mat', ii);
load(fn);
end

Matlab : How to open several .txt files using their names for variable names

I have several .txt files of 2 columns, that I’d like to import and use the second column for subsequent operations. My text files are name as: profileA1.txt, profileA2.txt, etc… and my variables corresponding to the data of the second column are named in the subsequent code are A1, A2, A3, etc…
The code works but currently I have to open manually each .txt file with the Import data wizard, change the name of the second column and click on the import the selection. I tried to write a code (see below) to automatize these steps but it doesn’t work? Anyone has any idea to fix this code?
Thanks
for k = 1:5
myfilename = sprintf('profileA%d.txt', k);
mydata = importdata(myfilename);
Aloop = ['A' num2str(k)];
A{k} = load(myfilename.data(:,2), k);
end
You're example is a little off because you convert myfilename from a character array to a structure magically. I would approach this by reading the entire text file into a cell array of characters then using cellfun and textscan to read in the columns. Like this:
function C = ReadTextFile(infile)
if (exist(infile, 'file') ~= 2)
error('ReadTextFile:INFILE','Unknown file: %s.\n', infile);
end
fid = fopen(infile, 'r');
C = textscan(fid, '%s', 'delimiter','\n');
C = C{1};
fclose(fid);
end
% assign an output file
outFile = 'SomeRandomFile.mat';
for k = 1:5
C = ReadTextFile(sprintf(‘profileA%d.txt’, k));
val = cell2mat(cellfun(#(x) textscan(x, '%*f %f','CollectOutput',1),C));
varName = sprintf('A%d', k);
assignin('base', varName, val);
save(outFile, varName, '-append')
end
You can skip the whole reading as a character array first but I have that function already so I just reused it.

Dynamically name a struct variable in MATLAB

I have several files "2011-01-01.txt", "2013-01-02.txt", "2015-02-01.txt", etc.
I wish to create a struct variable for each of the file such that (the values are made up):
machine20110101.size=[1,2,3];
machine20110101.weight=2345;
machine20110101.price=3456;
machine20130102.size=[2,3,4];
machine20130102.weight=1357;
machine20130102.price=123;
machine20150201.size=[1,2,4];
machine20150201.weight=1357;
machine20150201.price=123;
And,
save('f20110101.mat','machine20110101');
save('f20130102.mat','machine20130102') ;
save('f20150201.mat','machine20150201');
As we can see, the struct names are derived from the files' names. How can I construct the above struct variables?
I've searched for a while, but I didn't figure out how to use genvarname.
And these links (dynamic variable names in matlab, dynamic variable declaration function matlab, Dynamically change variable name inside a loop in MATLAB) didn't solve my problem.
I'm using MATLAB R2012b, so functions like matlab.lang.makeUniqueStrings defined after this version is unavailable.
Now that I'm in front of MATLAB, here's an example based on my comment above, utilizing dynamic field names with the filenames pruned using fileparts and regexprep in a cellfun call.
% Sample list for testing here, use uigetdir with dir or whatever method to
% get a list of files generically
filelist = {'C:\mydata\2011-01-01.txt', ...
'C:\mydata\2012-02-02.txt', ...
'C:\mydata\2013-03-03.txt', ...
'C:\mydata\2014-04-04.txt', ...
};
nfiles = length(filelist);
% Get filenames from your list of files
[~, filenames] = cellfun(#fileparts, filelist, 'UniformOutput', false);
% Prune unwanted characters from each filename and concatenate with 'machine'
prunedfilenames = regexprep(filenames, '-', '');
myfieldnames = strcat('machine', prunedfilenames);
% Generate your structure
for ii = 1:nfiles
% Parse your files for the data, using dummy variables since I don't
% know how your data is structured
loadedsize = [1, 2, 3];
loadedweight = 1234;
loadedprice = 1234;
% Add data to struct array
mydata.(myfieldnames{ii}).size = loadedsize;
mydata.(myfieldnames{ii}).weight = loadedweight;
mydata.(myfieldnames{ii}).price = loadedprice;
end
#patrik raises some good points in the comments. I think the more generic method he would like to see (please correct me if I'm wrong) goes something like this:
% Sample list for testing here, use uigetdir with dir or whatever method to
% get a list of files generically
filelist = {'C:\mydata\2011-01-01.txt', ...
'C:\mydata\2012-02-02.txt', ...
'C:\mydata\2013-03-03.txt', ...
'C:\mydata\2014-04-04.txt', ...
};
nfiles = length(filelist);
% Get filenames from your list of files
[~, filenames] = cellfun(#fileparts, filelist, 'UniformOutput', false);
% Prune unwanted characters from each filename and concatenate with 'machine'
prunedfilenames = regexprep(filenames, '-', '');
mytags = strcat('machine', prunedfilenames);
% Preallocate your structure
mydata = repmat(struct('tag', '', 'size', [1, 1, 1], 'weight', 1, 'price', 1), nfiles, 1);
% Fill your structure
for ii = 1:nfiles
% Parse your files for the data, using dummy variables since I don't
% know how your data is structured
loadedsize = [1, 2, 3];
loadedweight = 1234;
loadedprice = 1234;
% Add data to struct array
mydata(ii).tag = mytags{ii};
mydata(ii).size = loadedsize;
mydata(ii).weight = loadedweight;
mydata(ii).price = loadedprice;
end
Besides #excaza's answer, I used the following approach:
machine.size = [1,2,3]; machine.price = 335; machine.weight = 234;
machineName = ['machine',the_date];
machineSet = struct(machineName,machine);
save(OutputFile,'-struct','machineSet',machineName);

How to force Matlab to read files in a folder serially?

I have files in a folder that are numbered from writer_1 to writer_20. I wrote a code to read all the files and store them in cells. But the problem is that the files are not read serially.
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = []; %Remove these two from list
training = [];
for i = 1:length(folders)
current_folder = [Path_training folders(i).name '\'];
.
.
.
.
.
Here folders(1).name is writer_1 and folders(2).name is writer_10
I know that dir will return its results as the explorer does but is there any way to force it to go numerically?
I'm training an SVM based on these numbers and this problem is making it difficult.
I don't know of any direct solutions to the problem you are having.
I found a solution for a problem similar to yours, here
List = dir('*.png');
Name = {List.name};
S = sprintf('%s,', Name{:}); % '10.png,100.png,1000.png,20.png, ...'
D = sscanf(S, '%d.png,'); % [10; 100, 1000; 20; ...]
[sortedD, sortIndex] = sort(D); % Sort numerically
sortedName = Name(sortIndex); % Apply sorting index to original names
Differences are:
You are dealing with directories instead of files
Your directories have other text in their names in addition to the numbers
Approach #1
%// From your code
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = []
%// Convert folders struct to a cell array with all of the data from dir
folders_cellarr = struct2cell(folders)
%// Get filenames
fn = folders_cellarr(1,:)
%// Get numeral part and sorted indices
num=str2double(cellfun(#(x) strjoin(regexp(x,['\d'],'match'),''), fn(:),'uni',0))
[~,ind] = sort(num)
%// Re-arrange folders based on the sorted indices
folders = folders(ind)
Approach #2
If you would like to avoid struct2cell, here's an alternative approach -
%// Get filenames
fn = cellstr(ls(Path_training))
fn(ismember(fn,{'.', '..'}))=[]
%// Get numeral part and sorted indices
num=str2double(cellfun(#(x) strjoin(regexp(x,['\d'],'match'),''), fn(:),'uni',0))
[~,ind] = sort(num)
%// List directory and re-arrange the elements based on the sorted indices
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = []
folders = folders(ind)
Please note that strjoin is a recent addition to MATLAB Toolbox. So, if you are on an older version of MATLAB, here's the source code link from MATLAB File-exchange.
Stealing a bit from DavidS and with the assumption that your folders all are of the form "writer_XX" with XX being digits.
folders = dir([pwd '\temp']);
folders(ismember( {folders.name}, {'.', '..'}) ) = [];
% extract numbers from cell array
foldersNumCell = regexp({folders.name}, '\d*', 'match');
% convert from cell array of strings to double
foldersNumber = str2double(foldersNumCell);
% get sort order
[garbage,sortI] = sort(foldersNumber);
% rearrange the structure
folders = folders(sortI);
The advantage of this is that it avoids a for loop. In reality it only makes a difference though if you have tens of thousands for folders. (I created 50,000 folders labeled 'writer_1' to 'writer_50000'. The difference in execution time was about 1.2 seconds.
Here is a slightly different way (edited to fix bug and implement suggestion of #Divakar to eliminate for loop)
folders = dir(Path_training);
folders(ismember( {folders.name}, {'.', '..'}) ) = [];
%// Get folder numbers as cell array of strings
folder_nums_cell = regexp({folders.name}, '\d*', 'match');
%// Convert cell array to vector of numbers
folder_nums = str2double(vertcat(folder_nums_cell{:}));
%// Sort original folder array
[~,inds] = sort(folder_nums);
folders = folders(inds);

Matlab: Import multiple numeric csv .txt files into single cell array

I have multiple (say N of them) .txt files consisting of numeric csv data in matrix form. I would like to import each of these data files into one (1 x N) cell array, whilst preserving the original matrix form. If the original data is small, say 3x3, then textscan does the job in the following manner:
fileId = fopen('data1.txt');
A{1} = textscan(fileID, '%d %d %d', 'delimiter',',','CollectOutput',1);
(This will be part of a function.) But what if my .txt files have 100 columns of data? I could write '%d' 100 times in the formatSpec, but there must be a better way?
This seems to be an easy problem, but I'm quite new to Matlab and am at a loss as to how to proceed. Any advice would be much appreciated, thanks!!
For such cases with consistent data in each of those text files, you can use importdata without worrying about format specifiers. Two approaches are discussed here based on it.
Approach 1
filenames = {'data1.txt' 'data2.txt' 'data3.txt'}; %// cell array of filenames
A = cell(1,numel(filenames)); %// Pre-allocation
for k = 1:numel(filenames)
imported_data = importdata(char(filenames(k)));
formatted_data = cellfun(#str2num, imported_data, 'uni', 0);
A{k} = vertcat(formatted_data{:})
end
Approach 2
Assuming those text files are the only .txt files in the current working directory, you can directly get the filenames and use them to store data from them into a cell array, like this -
files = dir('*.txt');
A = cell(1,numel(files)); %// Pre-allocation
for k = 1:numel(files)
imported_data = importdata(files(k).name);
formatted_data = cellfun(#str2num, imported_data, 'uni', 0)
A{k} = vertcat(formatted_data{:})
end