Separating Files Based on their names - matlab

New to Matlab so this may be more simple than I'm realising.
I'm working with a large number of text files for some data analysis and would like to separate them into a number of categories. They come under a format similar to Tp_angle_RunNum.txt, and Ts_angle_RunNum.text. I would like to have 4 different groups, the Tp and Ts files for angle1, and the same for angle2.
Any help would be appreciated!

A few things here:
Remember that your text files are not necessarily going to be where you process your data, so think about how you want to store the data in memory while you work with it. It may be beneficial to pull all of the data into a MATLAB Table. You can read your text files directly into tables with the function readtable()., with columns to capture Ts, Tp, and angle... or angle1 and angle2. You can use logicals with all 4, i.e., a true or false as to which group the data row belongs. You could also capture the run number so you know exactly what file it came from. Lots of ways to store the data. But, if you get into very large data sets, Tables are way easier to manipulate and work with. They are also very fast, and compact if you use categorical types as applicable.
dir() will likely be a necessary function for you for the dir listing, however there are some alternatives and some considerations based upon your platform. I suggest you take a look at the doc page for dir.
Take advantage of MATLAB Strings and vector processing. Its not the only way to do things, but it is by far the easiest. Strings were introduced in R2016b, and have gotten better since as more and more features and capabilities work with strings. More recently, you can use patterns instead of (or with!) regular expressions for finding what you need to process. The doc page I linked above has some great examples, so no point in me reinventing that wheel.
fileparts() is also your friend in MATLAB when working with many files. You can use it to separate the path, the filename and the extension. You might use it to simplify your processing.
Regarding vector processing, you can take an entire array of strings and pass it to the functions used with pattern. Or, if you take my suggestion of working with tables, you can get rows of your table that match specific characteristics.
Lets look at a few of these concepts together with some sample code. I don't have your files, but I can demonstrate with just the output of the dir command and some dummy files... I'm going to break this into two parts, a dirTable function (which is a dir wrapper I like to use instead of dir, and keep on my path) and a script that uses it. I'd suggest you copy the code and take advantage of code sections to run a section at a time. See doc page on Create and Run Sections in Code if this is new to you
dirTable.m
% Filename: dirTable.m
function dirListing = dirTable( names )
arguments
names (:,1) string {mustBeText} = pwd()
end
dirListing = table();
for nIdx = 1:numel( names )
tempListing = dir( names( nIdx ) );
dirListing = [ dirListing;...
struct2table( tempListing,'AsArray', true ) ]; %#ok<AGROW>
end
if ~isempty( dirListing )
%Adjust some of the types...
dirListing.name = categorical( dirListing.name );
dirListing.folder = categorical( dirListing.folder );
dirListing.date = datetime( dirListing.date );
end
end
Example script
%% Create some dummy files - assuming windows, or I'd use the "touch" cmd.
cmds = "type nul >> Ts_42_" + (1:3) + ".txt";
cmds = [cmds;"type nul >> Tp_42_" + (1:3) + ".txt"];
cmds = [cmds;"type nul >> Ts_21_" + (1:3) + ".txt"];
cmds = [cmds;"type nul >> Tp_21_" + (1:3) + ".txt"];
for idx = 1:numel(cmds)
system(cmds(idx));
end
%% Get the directory listing for all the files
% Note, the filenames come out as categoricals by my design, though that
% doesnt help much for this example - in fact - I'll have to cast to
% convert the categoricals to string a few times. Thats ok, its not a
% heavy lift. If you use dir directly, you'd not only be casting to
% string, but you'd also have to deal with the structure and clunky if/else
% conditions everywhere.
listing = dirTable();
%% Define patterns for each of the 4 groups
% - pretending the first code cell doesnt exist.
Tp_Angle1_pattern = "Tp_21";
Ts_Angle1_pattern = "Ts_21";
Tp_Angle2_pattern = "Tp_42";
Ts_Angle2_pattern = "Ts_42";
%% Cycle a group's data, creating a single table from all the files
% I could be more clever here and loop through the patterns as well and
% create a table of tables; however, I am going to keep this code easier
% to read at the cost of repetitiveness. I will however use a local
% function to gather all the runs from one group into a single table.
Tp_Angle1_matches = string(listing.name).startsWith(Tp_Angle1_pattern);
Tp_Angle1_filenames = string(listing.name(Tp_Angle1_matches));
Tp_Angle1_data = aggregateDataFilesToTable(Tp_Angle1_filenames);
% Repeat for each group... Or loop the above code for a single table
% if you loop for a single table, make sure to add column(s) for the group
% information
%% My local function for reading all the files in a group
function data_table = aggregateDataFilesToTable(filenames)
arguments
filenames (:,1) string
end
% We could assume that since we're using run numbers at the end of the
% filename, that we'll get the filenames pre-sorted for us. If not zero
% padding the numbers, then need to extract the run number to determine the
% sort order of the files to read in. I'm going to be lazy and assume zero
% padded for simplicity.
data_table = table();
for fileIdx = 1:numel(filenames)
% For the following line, two things:
% 1) the [data_table;readtable()] syntax appends the table from
% readtable to the end of data_table.
% 2) The comment at the end acknowledges that this variable is growing
% in a loop, which is usually not the best practice; however, since I
% have no way of knowing the total table dimensions ahead of time, I
% cannot pre-allocate the table before the loop - hence the table()
% call before the for loop. If you have a way of knowing this ahead of
% time, do pre-allocate!
data_table = [data_table;readtable(filenames(fileIdx))]; %#ok<AGROW>
end
end
NOTE 1: using empty parens is not necessary on function calls with no parameters; however, I find it to be easier for others to read when they know they are calling a function and not reading a variable.
NOTE 2: I know the dummy files are empty. That won't matter for this example, as an empty table appended to an empty table is another empty table. And the OP's quesiton was about the file manipulation and grouping, etc.
NOTE 3: In case the syntax is new to you, BOTH functions in my example use function argument blocks, which were introduced in R2019b - and they make for much easier to maintain and read code than NOT validating inputs, or using more complex ways of validating inputs. I was going to leave that out of this example, but they were already in my dirTable function, so I figured I'd just explain it instead.

Related

Performing operations on a variable number of workspace elements in MATLAB

new MATLAB user here so apologies if this seems like a silly question. I have the following list of variables (doubles) in my workspace:
E1_01Strain E1_06Strain E1_07Strain E1_08Strain E1_09Strain E1_10Strain
E1_01Stress E1_06Stress E1_07Stress E1_08Stress E1_09Stress E1_10Stress
These are lists of numbers. I would like to remove the last n elements from each variable. I can do it with the command
E1_01Strain = E1_01Strain(1:end-100)
but it's impractical because later I'm going to have to do it on many, many more similar variables. Therefore I wanted to write a function that accepts as inputs a list of the workspace variables (as in, I highlight the variables I want and drag and drop into the function input) and removes from each one n elements.
I understand that I can write a function like this:
function [X1, X2, X3, X4] = Remove_n_elements[n, X1, X2, X3, X4]
X1 = X1(1:end-100);
X2 = X2(1:end-100);
X3 = X3(1:end-100);
X4= X4(1:end-100);
end
but that would mean that I would have to change the number of inputs, outputs, and the lines of code in the function every time. I'm sure there's a better way to do it but I can't figure it out.
I keep thinking that there might be a way to do it by looping over all the inputs but I can't get it to work since (as far as I know) I need to create a list of the inputs and then the operation is performed only on the elements of that list, not the inputs themselves.
I was looking at Passing A Variable Number of Arguments into a Function and from that using inputParser from https://www.mathworks.com/help/matlab/matlab_prog/parse-function-inputs.html but since I'm new to MATLAB I'm not sure how to use it for my case.
I used the code provided by il_raffa for a bit but followed his advice and went back and reconsidered how the script functions. After some more digging I wrote the following script that does exactly what I need. This script extracts the columns des_cols from all .csv files in a folder and plots them together. It then makes another plot of the averages.
files = dir('*.csv'); % navigate to the folder that you want to run the script on in MATLAB
avgStress = [];
avgStrain = [];
set(groot, 'DefaultLegendInterpreter', 'none') % the names of my .csv files have underscores that I want to see in the legend, if you don't want this then comment this line
hold on; %comment this and hold off further down if you want separate plots for every .csv
for file = files'
csv = xlsread(file.name);
[n,s,r] = xlsread(file.name);
des_cols = {'Stress','Ext.1(Strain)'}; % type here the names of the columns you want to extract
colhdrs = s(2,:);
[~,ia] = intersect(colhdrs, des_cols);
colnrs = flipud(ia);
file.name = n(:, colnrs);
file.name = file.name(1:end-600,:); % I wanted to remove the last 600 rows but if you want them all, remove the -600
plot(file.name(:,2),file.name(:,1),'DisplayName',s{1,1});
avgStress = [avgStress file.name(1:1500,1)]; % calculates the average stress for the first 1500 points, you can change it to whatever you want
avgStrain = [avgStrain file.name(1:1500,2)];
end
ylabel({'Stress (MPa)'}); % y-axis label
xlabel({'Strain (%)'}); %x-axis label
title({'E_2'}); % title of the plot
legend('show');
hold off; % commment this if you want different plots for all .csv files
avgStress = mean(avgStress,2);
avgStrain = mean(avgStrain,2);
plot(avgStrain,avgStress);
This creates two plots, one with all the raw data and another with just the averages. I hope this helps anyone that might have a similar issue.
The best thing you can do is to review the architecture of your SW in order to avoid the needs to perform such operations on the Workspace variables.
That is: how those variables are created? Are these variables loaded from a ".mat" file? etc.
Anyway, in order to avoid using the eval function and given your situation, a possible approach could be:
identify the names of the varailbes by using the function who. You can specify in the call to who the root name of the varaibles and use the * as, for example, who('E1*'). Make sure it fit wiht the desired variables. You can also use regexp to better refine the selection of the variables
save these varaibles in a temporary .mat file: the name (including the path of the temporary file can be created with the function tempname
load the temporary .mat file: this will create a struct in the Workspace whose fields are the variables you want to midify
call the function to remove the undesired elements form the fields of the struct. The function have to return the updated struct
save the updated struct in the temporary file
load again the temporary file by specifying the option -struct which allows loading the content of the file as single varaibles
The function to remove the undesired elements can be made as follows:
get the nams of the struct's fields by using the function fieldnames
loop over the filed of the struct by using the dynamic field names property
remove the undesired elements form the fields
return the updated struct
A possible implementatin could be:
Code "before" the call to the function
% Get the names of the variables
list_var=who('E1*')
% Define the name of a temporary ".mat" file
tmp_file=tempname
% Save the variables in the temporary ".mat" file
save(tmp_file,list_var{:});
% Load the variables in a struct
sel_vars=load(tmp_file);
% Call the function to remove the elements
out_str=Remove_n_elements(8,sel_vars)
Function to remove the undesired elements
function sel_vars=Remove_n_elements(n,sel_vars)
% Get the names of the fields of the struct
var_names=fieldnames(sel_vars)
% Loop over the fields and remove the undesired elements
for i=1:length(var_names)
sel_vars.(var_names{i})=sel_vars.(var_names{i})(1:end-n)
end
Code "after" the call to the function
% Save the updated struct in the temporary ".mat" file
save(tmp_file,'-struct','out_str')
% Load the updated struct as separate variables
load(tmp_file)

Struggling to index into a column vector produced using the eval function

I have several tables, each with a massive amount of data (20MB each). Let's call them Table_1, Table_2, Table_3, etc. The structure of these tables are similar, with identical column headers (but a varying number of rows). Let's say Attribute_A, Attribute_B, etc.
I am interested in performing calculations on Attribute_B for each table. These calculations involve looping through Attribute_B.
I start by creating a master list with all of my table names.
Table_List = [Table_1, Table_2, ... Table 10];
Now I can iterate through this and perform my calculations. Doing single calculations works just fine using eval and a concatentation of my column vector name as a function of the value of Current_Table:
for Current_Table = Table_List
Peak_Value = max(eval(strcat(Current_Table,"Attribute_C")));
I run into trouble when I perform calculations that require iterating through the column vectors. For instance, the following fails.
for Current_Table = Table_List
for i = 1:length(eval(strcat(Current_Table,".Attribute_B")
X = X + eval(strcat(Current_Table,".Attribute_B"))(i);
MATLAB gets hung up when I try to marry the evaluated column and the desired index value of i. Is there a way to do this?
I understand if I made a single structure of all of the data this would be much easier (not using the string list but actually combining the data). I want to iterate through each of my tables without re-writing data.
As excaza pointed out, usage of eval in this kind of problem is greatly discouraged. One safer and more elegant approach is to use structures.
Here is an example of a program that should be useful to you that uses structures
clear
% Intialize the structure that will hold all the tables
TABLES = struct;
% List of table names
table_names = {'Table_1' 'Table_2' 'Table_3'};
% Number of tables
nTables = length(table_names);
% Fill the tables with some data
% TABLES can also be nested structures with more than one data item in them in
% each substructure
for itb=1:nTables
% TABLES.(table_names{itb}) = rand(3);
% Load from files with names Table_N.txt
TABLES.(table_names{itb}) = load([table_names{itb},'.txt']);
end
% Perform operations on the components
for jtb=1:nTables
% Using a temporary array
% temp = [];
% temp = TABLES.(table_names{jtb});
% A(jtb) = max(max(temp));
% B(jtb) = temp(2,1);
% .. and such
% Or directly
A(jtb) = max(max(TABLES.(table_names{jtb})));
B(jtb) = TABLES.(table_names{jtb})(2,1);
end
You have a list of names and you use them to create either nested structures or anything else - like arrays here - inside of the main structure called TABLES. You can populate this data from files, in this case I simply used rand. Both collecting the data and any operations on it can be done with loops. In the second loop you have an example how you can manipulate the data directly or indirectly using temporary arrays.
This is a relatively broad topic so feel free to ask questions and I'll add them to my answer.

MATLAB: How can I efficiently read in these data files?

I have 100 data files in a folder called "Experiment1", and I need to take all of the data from them and put them into a single matrix. Each data file contains 15 columns and 40 rows of data.
The order in which the files are in the folder is arbitrary. It doesn't matter in what order they get put into the combined matrix.
I've written some code using dlmread that will do the job:
for i = 1:100
%% Read in the relevant file.
filename = ['File_' int2str(i) '.dat']
Data = dlmread(fullfile(pwd, 'Experiment1',filename));
%% Put all the data I need in a separate matrix
NeededData(1+((i-1)*40):i+((i-1)*40)-i+40,1:15) = Data(:,1:15);
end
However, there are two things I don't like about my code.
The files have random names at present, and I'd need to manually change all their names to "File_1.dat", "File_2.dat", etc.
The code is cumbersome and hard to read.
How could I do things better?
Since you've fixed the problem of defining the name of the files to be read with dir, you can improve the way you add the read data (Data) to the output matrix (NeededData).
You can sumultaneously read the input files and add the data to the output matrix by inserting the call to dlmread directly in the assignment statement:
files=dir('*.dat');
n_files=length(files)
% Initialize the output matrix as empty
NeededData_0=[]
for i=1:n_files
% Simultaneously read input file and assign data to the output matrinx
NeededData_0=[NeededData_0;dlmread(files(i).name)]
end
In case you prefer working with the inides (as in your origina approach), since you know in advance that all the files have the same (40) number of rows) you can simplify the notation as follows:
files=dir('*.dat');
n_files=length(files)
% Define the number of rows in each inout file
n_rows=40;
% Define the number of colums in each inout file
n_col=15;
NeededData_2=nan(n_rows*n_files,n_col)
% Define the sequence of rows
r_list=1:n_rows:n_rows*n_files
for i=1:3
Data=dlmread(files(i).name)
NeededData_2(r_list(i):r_list(i)+n_rows-1,:)=Data
end
Hope this helps.
Using the suggestion to use dir present in the answers I have made the following code, which is clearly an improvement on my earlier effort. I would welcome further improvements, or suggestions for alternative approaches.
files = dir('*.dat');
for i = 1:length({files.name})
%% Read in the relevant file.
Data = dlmread(files(i).name);
%% Put all the data I need in a separate matrix
NeededData(1+((i-1)*40):i+((i-1)*40)-i+40,1:15) = Data(:,1:15);
end

csvwrite in loop with numbered filenames in matlab

kinda new to matlab here, searching the csvwrite tutorial and some of the existing webportals regarding my question couldn't find a way to pass my variables by value to the output file names while exporting in csv; providing my bellow scripts, i would like to have the output files something like output_$aa_$dd.csv which aa and dd are respectively the first and second for counters of the scripts.
for aa=1:27
for dd=1:5
M_Normal=bench(aa,dd).Y;
for j=1:300
randRand=M_Normal(randperm(12000,12000));
for jj = 1:numel(randMin(:,1)); % loops over the rand numbers
vv= randMin(jj,1); % gets the value
randMin(jj,j+1)=min(randRand(1:vv)); % get and store the min of the selction in the matix
end
end
csvwrite('/home/amir/amir_matlab/sprintf(''%d%d',aa, bb).csv',randMin);
end
end
String concatenation in MATLAB is done like a matrix concatenation. For example
a='app';
b='le';
c=[a,b] % returns 'apple'
Hence, in your problem, the full path can be formed this way.
['/home/amir/amir_matlab/',sprintf('%d_%d',aa,bb),'.csv']
Furthermore, it is usually best not to specify the file separator explicitly, so that your code can be implemented in other operating systems. I suggest you write the full path as
fullfile('home','amir','amir_matlab',sprintf('%d_%d.csv',aa,bb))
Cheers.

Extracting variables while reading in data files

I am quite new to data analysis, so if this is a rookie question, I'm sorry, I am learning as I go.
I have just started doing some work in variable star astronomy. I have about 100 files for every night of observation that all contain the same basic information (star coordinates, magnitude, etc.). I am loading all of the files into my workspace as arrays using a for-loop
files = dir('*.out');
for i=1:length(files)
eval(['load ' files(i).name ' -ascii']);
end
I'm only really interested in two columns in each file. Is there a way to extract a column and set it to a vector while this for-loop is running? I'm sure that it's possible, but the actual syntax for it is escaping me.
try using load as a function and save it's output to a variable
files = dir('*.out');
twoCols = {};
for ii=1:length(files)
data = load( files(ii).name, '-ascii' ); % load file into "data"
twoCols{ii} = data(:,1:2); % take only two columns
end
Now variable twoCols holds the two columns of each file in a different cell.
You have to assign the load result to a new variable. Then if lets say your variable is starsInfo you can use
onlyTwoFirst = starsInfo(:,1:2)
That means take all the rows, but only columns 1 and 2.