Extracting variables while reading in data files - matlab

I am quite new to data analysis, so if this is a rookie question, I'm sorry, I am learning as I go.
I have just started doing some work in variable star astronomy. I have about 100 files for every night of observation that all contain the same basic information (star coordinates, magnitude, etc.). I am loading all of the files into my workspace as arrays using a for-loop
files = dir('*.out');
for i=1:length(files)
eval(['load ' files(i).name ' -ascii']);
end
I'm only really interested in two columns in each file. Is there a way to extract a column and set it to a vector while this for-loop is running? I'm sure that it's possible, but the actual syntax for it is escaping me.

try using load as a function and save it's output to a variable
files = dir('*.out');
twoCols = {};
for ii=1:length(files)
data = load( files(ii).name, '-ascii' ); % load file into "data"
twoCols{ii} = data(:,1:2); % take only two columns
end
Now variable twoCols holds the two columns of each file in a different cell.

You have to assign the load result to a new variable. Then if lets say your variable is starsInfo you can use
onlyTwoFirst = starsInfo(:,1:2)
That means take all the rows, but only columns 1 and 2.

Related

Separating Files Based on their names

New to Matlab so this may be more simple than I'm realising.
I'm working with a large number of text files for some data analysis and would like to separate them into a number of categories. They come under a format similar to Tp_angle_RunNum.txt, and Ts_angle_RunNum.text. I would like to have 4 different groups, the Tp and Ts files for angle1, and the same for angle2.
Any help would be appreciated!
A few things here:
Remember that your text files are not necessarily going to be where you process your data, so think about how you want to store the data in memory while you work with it. It may be beneficial to pull all of the data into a MATLAB Table. You can read your text files directly into tables with the function readtable()., with columns to capture Ts, Tp, and angle... or angle1 and angle2. You can use logicals with all 4, i.e., a true or false as to which group the data row belongs. You could also capture the run number so you know exactly what file it came from. Lots of ways to store the data. But, if you get into very large data sets, Tables are way easier to manipulate and work with. They are also very fast, and compact if you use categorical types as applicable.
dir() will likely be a necessary function for you for the dir listing, however there are some alternatives and some considerations based upon your platform. I suggest you take a look at the doc page for dir.
Take advantage of MATLAB Strings and vector processing. Its not the only way to do things, but it is by far the easiest. Strings were introduced in R2016b, and have gotten better since as more and more features and capabilities work with strings. More recently, you can use patterns instead of (or with!) regular expressions for finding what you need to process. The doc page I linked above has some great examples, so no point in me reinventing that wheel.
fileparts() is also your friend in MATLAB when working with many files. You can use it to separate the path, the filename and the extension. You might use it to simplify your processing.
Regarding vector processing, you can take an entire array of strings and pass it to the functions used with pattern. Or, if you take my suggestion of working with tables, you can get rows of your table that match specific characteristics.
Lets look at a few of these concepts together with some sample code. I don't have your files, but I can demonstrate with just the output of the dir command and some dummy files... I'm going to break this into two parts, a dirTable function (which is a dir wrapper I like to use instead of dir, and keep on my path) and a script that uses it. I'd suggest you copy the code and take advantage of code sections to run a section at a time. See doc page on Create and Run Sections in Code if this is new to you
dirTable.m
% Filename: dirTable.m
function dirListing = dirTable( names )
arguments
names (:,1) string {mustBeText} = pwd()
end
dirListing = table();
for nIdx = 1:numel( names )
tempListing = dir( names( nIdx ) );
dirListing = [ dirListing;...
struct2table( tempListing,'AsArray', true ) ]; %#ok<AGROW>
end
if ~isempty( dirListing )
%Adjust some of the types...
dirListing.name = categorical( dirListing.name );
dirListing.folder = categorical( dirListing.folder );
dirListing.date = datetime( dirListing.date );
end
end
Example script
%% Create some dummy files - assuming windows, or I'd use the "touch" cmd.
cmds = "type nul >> Ts_42_" + (1:3) + ".txt";
cmds = [cmds;"type nul >> Tp_42_" + (1:3) + ".txt"];
cmds = [cmds;"type nul >> Ts_21_" + (1:3) + ".txt"];
cmds = [cmds;"type nul >> Tp_21_" + (1:3) + ".txt"];
for idx = 1:numel(cmds)
system(cmds(idx));
end
%% Get the directory listing for all the files
% Note, the filenames come out as categoricals by my design, though that
% doesnt help much for this example - in fact - I'll have to cast to
% convert the categoricals to string a few times. Thats ok, its not a
% heavy lift. If you use dir directly, you'd not only be casting to
% string, but you'd also have to deal with the structure and clunky if/else
% conditions everywhere.
listing = dirTable();
%% Define patterns for each of the 4 groups
% - pretending the first code cell doesnt exist.
Tp_Angle1_pattern = "Tp_21";
Ts_Angle1_pattern = "Ts_21";
Tp_Angle2_pattern = "Tp_42";
Ts_Angle2_pattern = "Ts_42";
%% Cycle a group's data, creating a single table from all the files
% I could be more clever here and loop through the patterns as well and
% create a table of tables; however, I am going to keep this code easier
% to read at the cost of repetitiveness. I will however use a local
% function to gather all the runs from one group into a single table.
Tp_Angle1_matches = string(listing.name).startsWith(Tp_Angle1_pattern);
Tp_Angle1_filenames = string(listing.name(Tp_Angle1_matches));
Tp_Angle1_data = aggregateDataFilesToTable(Tp_Angle1_filenames);
% Repeat for each group... Or loop the above code for a single table
% if you loop for a single table, make sure to add column(s) for the group
% information
%% My local function for reading all the files in a group
function data_table = aggregateDataFilesToTable(filenames)
arguments
filenames (:,1) string
end
% We could assume that since we're using run numbers at the end of the
% filename, that we'll get the filenames pre-sorted for us. If not zero
% padding the numbers, then need to extract the run number to determine the
% sort order of the files to read in. I'm going to be lazy and assume zero
% padded for simplicity.
data_table = table();
for fileIdx = 1:numel(filenames)
% For the following line, two things:
% 1) the [data_table;readtable()] syntax appends the table from
% readtable to the end of data_table.
% 2) The comment at the end acknowledges that this variable is growing
% in a loop, which is usually not the best practice; however, since I
% have no way of knowing the total table dimensions ahead of time, I
% cannot pre-allocate the table before the loop - hence the table()
% call before the for loop. If you have a way of knowing this ahead of
% time, do pre-allocate!
data_table = [data_table;readtable(filenames(fileIdx))]; %#ok<AGROW>
end
end
NOTE 1: using empty parens is not necessary on function calls with no parameters; however, I find it to be easier for others to read when they know they are calling a function and not reading a variable.
NOTE 2: I know the dummy files are empty. That won't matter for this example, as an empty table appended to an empty table is another empty table. And the OP's quesiton was about the file manipulation and grouping, etc.
NOTE 3: In case the syntax is new to you, BOTH functions in my example use function argument blocks, which were introduced in R2019b - and they make for much easier to maintain and read code than NOT validating inputs, or using more complex ways of validating inputs. I was going to leave that out of this example, but they were already in my dirTable function, so I figured I'd just explain it instead.

How to loop through multiple structures and perform the same operations [Matlab]

I am trying to loop through multiple structures at once, extract variables of interest, and combine them into a single cell array. The problem: all of the variables have the same names. I have a working pseudocode--here it is:
Let's say I load i structures in my workspace. Now I want to loop through each structure, and extract time and position data from each structure.
First, I load my structures. Something like...
data_1
data_2
data_3
Then, I create appropriately sized cell arrays.
time{i,:} = zeros(size(structures));
position{i,:} = zeros(size(structures));
Finally, I loop through my structures to extract cell arrays and create a single array.
for i = 1:size(structures)
time_i= data_i.numbers.time;
position_i= data_i.numbers.position;
time {i,:} = time_i;
position{i,:} = position_i;
end
I want to end with a cell array containing a concatenation of all the variables in a single cell structure.
Could you please help convert my pseudo code/ideas into a script, or point me to resources that might help?
Thanks!
You're likely going to be better off loading your data internal to the loop and storing it into a cell or structure rather than trying to deal with iteratively named variables in your workspace. eval is, in nearly all cases, a significant code smell, not least of which because MATLAB's JIT compiler ignores eval statements so you get none of the engine's optimizations. eval statements are also difficult to parse, debug, and maintain.
An example of a stronger approach:
for ii = 1:nfiles
tmp = load(thefilenames{ii}); % Or use output of dir
trialstr = sprintf('trial_%u', ii); % Generate trial string
data.(trialstr).time = tmp.numbers.time;
data.(trialstr).position = tmp.numbers.position;
end
Which leaves you with a final data structure of:
data
trial_n
time
position
Which is far easier to iterate through later.
My final script for anyone interested:
for i = 1:4 %for 4 structures that I am looping through
eval(['time_',num2str(i),'= data_',num2str(i),'.numbers.time;']);
eval(['position_',num2str(i),'= data_',num2str(i),'.numbers.position;']);
%concatenate data into a single cell array here
time{i} = {eval(['time_',num2str(i)])};
position{i} = {eval(['position_',num2str(i)])};
end
...
eval(['time_',num2str(i),'= data_',num2str(i),'.numbers.time;'])
eval(['position_',num2str(i),'= data_',num2str(i),'.numbers.position;'])
...

MATLAB: How can I efficiently read in these data files?

I have 100 data files in a folder called "Experiment1", and I need to take all of the data from them and put them into a single matrix. Each data file contains 15 columns and 40 rows of data.
The order in which the files are in the folder is arbitrary. It doesn't matter in what order they get put into the combined matrix.
I've written some code using dlmread that will do the job:
for i = 1:100
%% Read in the relevant file.
filename = ['File_' int2str(i) '.dat']
Data = dlmread(fullfile(pwd, 'Experiment1',filename));
%% Put all the data I need in a separate matrix
NeededData(1+((i-1)*40):i+((i-1)*40)-i+40,1:15) = Data(:,1:15);
end
However, there are two things I don't like about my code.
The files have random names at present, and I'd need to manually change all their names to "File_1.dat", "File_2.dat", etc.
The code is cumbersome and hard to read.
How could I do things better?
Since you've fixed the problem of defining the name of the files to be read with dir, you can improve the way you add the read data (Data) to the output matrix (NeededData).
You can sumultaneously read the input files and add the data to the output matrix by inserting the call to dlmread directly in the assignment statement:
files=dir('*.dat');
n_files=length(files)
% Initialize the output matrix as empty
NeededData_0=[]
for i=1:n_files
% Simultaneously read input file and assign data to the output matrinx
NeededData_0=[NeededData_0;dlmread(files(i).name)]
end
In case you prefer working with the inides (as in your origina approach), since you know in advance that all the files have the same (40) number of rows) you can simplify the notation as follows:
files=dir('*.dat');
n_files=length(files)
% Define the number of rows in each inout file
n_rows=40;
% Define the number of colums in each inout file
n_col=15;
NeededData_2=nan(n_rows*n_files,n_col)
% Define the sequence of rows
r_list=1:n_rows:n_rows*n_files
for i=1:3
Data=dlmread(files(i).name)
NeededData_2(r_list(i):r_list(i)+n_rows-1,:)=Data
end
Hope this helps.
Using the suggestion to use dir present in the answers I have made the following code, which is clearly an improvement on my earlier effort. I would welcome further improvements, or suggestions for alternative approaches.
files = dir('*.dat');
for i = 1:length({files.name})
%% Read in the relevant file.
Data = dlmread(files(i).name);
%% Put all the data I need in a separate matrix
NeededData(1+((i-1)*40):i+((i-1)*40)-i+40,1:15) = Data(:,1:15);
end

How to concatenate all the variables of two different matlab files?

I have two mat files with identical list of variables.
In file1.mat
*Variables* *Value* *Type*
Time [100X1] double
Force [100x1] double
In file2.mat
*Variables* *Value* *Type*
Time_1 [90X1] double
Force_1 [90x1] double
I would like to vertically concatenate these two files. The suffix _1 added to the file2 changes to _2 or _32 etc.
How can I refer the variables and concatenate them in a loop so I don't have to open the file every time and enter the variable names in vertcat?
You can use two nice properties of the load command for this task. Firstly, load with an output argument creates a structure with field names equal to the variable names, which means that you can load data without having to know ahead of time what the variables were named. Secondly, the fields are assigned in alphabetical order, which means that force will always be the first field, and time the second field.
Combining these properties, you can do the following:
%# get a listing of all save files
fileList = dir('file*');
nFiles = length(fileList);
loadedData = cell(nFiles,2); %# for storing {force,time}
%# loop through files and write results into loadedData
for iFile = 1:nFiles
tmp = load(fileList{iFile});
loadedData(iFile,:) = struct2cell(tmp)';
end
%# catenate
time = cat(1,loadedData(:,2));
force = cat(1,loadedData(:,1));
Note that if your files are called file1...file10, instead of file001...file010, the alphabetical ordering you get from using the dir command may not be ideal. In that case, you may have to extract the number at the end of the file name and re-order the list.
Does the following code snippet help to solve your problem?
Time_1 = [1; 2];
Time_2 = [2; 3];
Time_3 = 4;
All = [];
for i = 1:3
CurTime = eval(horzcat('Time_', num2str(i)));
All = [All; CurTime];
end
Essentially what is happening is I'm looping over the suffixes of Time_1, Time_2, and Time_3. For each iteration, I obtain the pertinent Time_x variable by manually building the name of the variable in a string, and then allocating it to CurTime using the eval function. Then simply perform the desired concatenation using CurTime
Of course, this code is not very efficient as All is growing within the loop. If you know the size of All beforehand, then you can pre-allocate it. If the size is unknown before the fact, you can implement the solution here (or else just pre-allocate it as being arbitrarily large and then cut it down to size once the loop is complete).
Let me know if I've misunderstood the problem you're having and I'll try and come up with something more helpful.
Also, some people think eval is evil. Certainly if your code contains 10 different calls to eval then you're probably doing it wrong.

How to save data using matfile Matlab

New to matlab and I need some help.
I need to create a .mat file , using matObj or save(), that has some information that will be passed from some variable. Lets say that variable x = 1,2,3,4,5
1|2|3|4|5|
Then I need to save that in test.mat
Then I need to load that file and save something like,
6|7|8|9|10|
So I get
1|2|3|4|5|
6|7|8|9|10|
and so on.
So every time I save it goes to a new row. The numbers that go inside they are not random the above numbers are just there to make things simple to see.
Can someone help me out.
You are describing two different problems here. The first is saving and loading of data.
Saving is easy:
x = 1:5;
filename = 'myFile.mat'
save(filename, 'x'); %notice that I used the string name of the variable
Likewise loading is also simple:
filename = 'myFile.mat';
data = load(filename); % loaded variables are placed in a struct to prevent overwriting workspace variables
x = data.x;
The 2nd problem can be solved using concatenation:
lets say you want to convert the vector 1 2 3 into the matrix:
1 2 3
1 2 3
You can simply call:
v = 1:3;
m = cat(1, v, v);
Likewise you can add an additional row to the existing matrix using the same command:
m = cat(1, m, v);
I'm sure any amount of googling will get you how to save a variable to a mat file - The matlab docs are absolutely spectacular, and such a simple operation will be covered along with examples showing exactly how to use the functions.
As for the second part, use the concatenation property
new = [old1 old2];
to concatenate horizontally, and
new = [old1;old2];
to concatenate vertically. Then resave the same way that you just learned via google.
Hope this helps, and in the future, i guarantee 99% of the answers to a new user's questions will be in the top two google search results if you append "matlab" to your search. The Mathworks really set the bar on documentation in my opinion. (Of course, I last used MATLAB 3 years ago)