kinda new to matlab here, searching the csvwrite tutorial and some of the existing webportals regarding my question couldn't find a way to pass my variables by value to the output file names while exporting in csv; providing my bellow scripts, i would like to have the output files something like output_$aa_$dd.csv which aa and dd are respectively the first and second for counters of the scripts.
for aa=1:27
for dd=1:5
M_Normal=bench(aa,dd).Y;
for j=1:300
randRand=M_Normal(randperm(12000,12000));
for jj = 1:numel(randMin(:,1)); % loops over the rand numbers
vv= randMin(jj,1); % gets the value
randMin(jj,j+1)=min(randRand(1:vv)); % get and store the min of the selction in the matix
end
end
csvwrite('/home/amir/amir_matlab/sprintf(''%d%d',aa, bb).csv',randMin);
end
end
String concatenation in MATLAB is done like a matrix concatenation. For example
a='app';
b='le';
c=[a,b] % returns 'apple'
Hence, in your problem, the full path can be formed this way.
['/home/amir/amir_matlab/',sprintf('%d_%d',aa,bb),'.csv']
Furthermore, it is usually best not to specify the file separator explicitly, so that your code can be implemented in other operating systems. I suggest you write the full path as
fullfile('home','amir','amir_matlab',sprintf('%d_%d.csv',aa,bb))
Cheers.
Related
New to Matlab so this may be more simple than I'm realising.
I'm working with a large number of text files for some data analysis and would like to separate them into a number of categories. They come under a format similar to Tp_angle_RunNum.txt, and Ts_angle_RunNum.text. I would like to have 4 different groups, the Tp and Ts files for angle1, and the same for angle2.
Any help would be appreciated!
A few things here:
Remember that your text files are not necessarily going to be where you process your data, so think about how you want to store the data in memory while you work with it. It may be beneficial to pull all of the data into a MATLAB Table. You can read your text files directly into tables with the function readtable()., with columns to capture Ts, Tp, and angle... or angle1 and angle2. You can use logicals with all 4, i.e., a true or false as to which group the data row belongs. You could also capture the run number so you know exactly what file it came from. Lots of ways to store the data. But, if you get into very large data sets, Tables are way easier to manipulate and work with. They are also very fast, and compact if you use categorical types as applicable.
dir() will likely be a necessary function for you for the dir listing, however there are some alternatives and some considerations based upon your platform. I suggest you take a look at the doc page for dir.
Take advantage of MATLAB Strings and vector processing. Its not the only way to do things, but it is by far the easiest. Strings were introduced in R2016b, and have gotten better since as more and more features and capabilities work with strings. More recently, you can use patterns instead of (or with!) regular expressions for finding what you need to process. The doc page I linked above has some great examples, so no point in me reinventing that wheel.
fileparts() is also your friend in MATLAB when working with many files. You can use it to separate the path, the filename and the extension. You might use it to simplify your processing.
Regarding vector processing, you can take an entire array of strings and pass it to the functions used with pattern. Or, if you take my suggestion of working with tables, you can get rows of your table that match specific characteristics.
Lets look at a few of these concepts together with some sample code. I don't have your files, but I can demonstrate with just the output of the dir command and some dummy files... I'm going to break this into two parts, a dirTable function (which is a dir wrapper I like to use instead of dir, and keep on my path) and a script that uses it. I'd suggest you copy the code and take advantage of code sections to run a section at a time. See doc page on Create and Run Sections in Code if this is new to you
dirTable.m
% Filename: dirTable.m
function dirListing = dirTable( names )
arguments
names (:,1) string {mustBeText} = pwd()
end
dirListing = table();
for nIdx = 1:numel( names )
tempListing = dir( names( nIdx ) );
dirListing = [ dirListing;...
struct2table( tempListing,'AsArray', true ) ]; %#ok<AGROW>
end
if ~isempty( dirListing )
%Adjust some of the types...
dirListing.name = categorical( dirListing.name );
dirListing.folder = categorical( dirListing.folder );
dirListing.date = datetime( dirListing.date );
end
end
Example script
%% Create some dummy files - assuming windows, or I'd use the "touch" cmd.
cmds = "type nul >> Ts_42_" + (1:3) + ".txt";
cmds = [cmds;"type nul >> Tp_42_" + (1:3) + ".txt"];
cmds = [cmds;"type nul >> Ts_21_" + (1:3) + ".txt"];
cmds = [cmds;"type nul >> Tp_21_" + (1:3) + ".txt"];
for idx = 1:numel(cmds)
system(cmds(idx));
end
%% Get the directory listing for all the files
% Note, the filenames come out as categoricals by my design, though that
% doesnt help much for this example - in fact - I'll have to cast to
% convert the categoricals to string a few times. Thats ok, its not a
% heavy lift. If you use dir directly, you'd not only be casting to
% string, but you'd also have to deal with the structure and clunky if/else
% conditions everywhere.
listing = dirTable();
%% Define patterns for each of the 4 groups
% - pretending the first code cell doesnt exist.
Tp_Angle1_pattern = "Tp_21";
Ts_Angle1_pattern = "Ts_21";
Tp_Angle2_pattern = "Tp_42";
Ts_Angle2_pattern = "Ts_42";
%% Cycle a group's data, creating a single table from all the files
% I could be more clever here and loop through the patterns as well and
% create a table of tables; however, I am going to keep this code easier
% to read at the cost of repetitiveness. I will however use a local
% function to gather all the runs from one group into a single table.
Tp_Angle1_matches = string(listing.name).startsWith(Tp_Angle1_pattern);
Tp_Angle1_filenames = string(listing.name(Tp_Angle1_matches));
Tp_Angle1_data = aggregateDataFilesToTable(Tp_Angle1_filenames);
% Repeat for each group... Or loop the above code for a single table
% if you loop for a single table, make sure to add column(s) for the group
% information
%% My local function for reading all the files in a group
function data_table = aggregateDataFilesToTable(filenames)
arguments
filenames (:,1) string
end
% We could assume that since we're using run numbers at the end of the
% filename, that we'll get the filenames pre-sorted for us. If not zero
% padding the numbers, then need to extract the run number to determine the
% sort order of the files to read in. I'm going to be lazy and assume zero
% padded for simplicity.
data_table = table();
for fileIdx = 1:numel(filenames)
% For the following line, two things:
% 1) the [data_table;readtable()] syntax appends the table from
% readtable to the end of data_table.
% 2) The comment at the end acknowledges that this variable is growing
% in a loop, which is usually not the best practice; however, since I
% have no way of knowing the total table dimensions ahead of time, I
% cannot pre-allocate the table before the loop - hence the table()
% call before the for loop. If you have a way of knowing this ahead of
% time, do pre-allocate!
data_table = [data_table;readtable(filenames(fileIdx))]; %#ok<AGROW>
end
end
NOTE 1: using empty parens is not necessary on function calls with no parameters; however, I find it to be easier for others to read when they know they are calling a function and not reading a variable.
NOTE 2: I know the dummy files are empty. That won't matter for this example, as an empty table appended to an empty table is another empty table. And the OP's quesiton was about the file manipulation and grouping, etc.
NOTE 3: In case the syntax is new to you, BOTH functions in my example use function argument blocks, which were introduced in R2019b - and they make for much easier to maintain and read code than NOT validating inputs, or using more complex ways of validating inputs. I was going to leave that out of this example, but they were already in my dirTable function, so I figured I'd just explain it instead.
I have 100 data files in a folder called "Experiment1", and I need to take all of the data from them and put them into a single matrix. Each data file contains 15 columns and 40 rows of data.
The order in which the files are in the folder is arbitrary. It doesn't matter in what order they get put into the combined matrix.
I've written some code using dlmread that will do the job:
for i = 1:100
%% Read in the relevant file.
filename = ['File_' int2str(i) '.dat']
Data = dlmread(fullfile(pwd, 'Experiment1',filename));
%% Put all the data I need in a separate matrix
NeededData(1+((i-1)*40):i+((i-1)*40)-i+40,1:15) = Data(:,1:15);
end
However, there are two things I don't like about my code.
The files have random names at present, and I'd need to manually change all their names to "File_1.dat", "File_2.dat", etc.
The code is cumbersome and hard to read.
How could I do things better?
Since you've fixed the problem of defining the name of the files to be read with dir, you can improve the way you add the read data (Data) to the output matrix (NeededData).
You can sumultaneously read the input files and add the data to the output matrix by inserting the call to dlmread directly in the assignment statement:
files=dir('*.dat');
n_files=length(files)
% Initialize the output matrix as empty
NeededData_0=[]
for i=1:n_files
% Simultaneously read input file and assign data to the output matrinx
NeededData_0=[NeededData_0;dlmread(files(i).name)]
end
In case you prefer working with the inides (as in your origina approach), since you know in advance that all the files have the same (40) number of rows) you can simplify the notation as follows:
files=dir('*.dat');
n_files=length(files)
% Define the number of rows in each inout file
n_rows=40;
% Define the number of colums in each inout file
n_col=15;
NeededData_2=nan(n_rows*n_files,n_col)
% Define the sequence of rows
r_list=1:n_rows:n_rows*n_files
for i=1:3
Data=dlmread(files(i).name)
NeededData_2(r_list(i):r_list(i)+n_rows-1,:)=Data
end
Hope this helps.
Using the suggestion to use dir present in the answers I have made the following code, which is clearly an improvement on my earlier effort. I would welcome further improvements, or suggestions for alternative approaches.
files = dir('*.dat');
for i = 1:length({files.name})
%% Read in the relevant file.
Data = dlmread(files(i).name);
%% Put all the data I need in a separate matrix
NeededData(1+((i-1)*40):i+((i-1)*40)-i+40,1:15) = Data(:,1:15);
end
I have a text file with several HEX values following a format like so:
%
AAAAAAAA
%
AAAAAAAB
and I am trying to use the fgetl() function in MATLAB to obtain the size of the HEX values (for a purpose of which I'm not entirely certain of... if it is important to you, I'll try to decipher what they were doing). Currently, this is what is being attempted:
folder = 'FolderA\hexdata.txt';
fidr = fopen(folder);
while ~feof(fidr)
get = fgetl(fidr);
hexdata=get;
if strncmp(get,'%',1)
time=time+.5;
continue
elseif size(get)<8
continue
end
%Do stuff here
end
For some reason, fgetl is returning -1 every time which I know means the line it is reading only contains the end-of-file marker. Is there something obvious I am doing wrong that I just don't see? I'm not the strongest MATLAB coder by any stretch of the imagination, so it is very possible I am missing something obvious.
Take a look at your filename folder, a seperator is missing. Use fullfile to get a proper path.
I have 11x11 matrices and I saved them as .mat files from F01_01 to F11_11.
I have to run a function Func on each file. Since it takes a long time, I want to write a script to run the function automatically:
for i=01:11
for j=01:11
filename=['F',num2str(i), '_', num2str(j),'.mat'];
load(filename);
Func(Fi_j); % run the function for each file Fi_j
end
end
But it doesn't work, Matlab cannot find the mat-files.
Could somebody please help ?
The problem
i=01;
j=01;
['F',num2str(i), '_', num2str(j),'.mat']
evaluates to
F1_1.mat
and not to
F01_01.mat
as expected.
The reason for this is that i=01 is a double type assignment and i equals to 1 - there are no leading zeros for these types of variables.
A solution
a possible solution for the problem would be
for ii = 1:11
for jj= 1:11
filename = sprintf('F_%02d_%02d.mat', ii, jj );
load(filename);
Func(Fi_j); % run the function for each file Fi_j
end
end
Several comments:
Note the use of sprintf to format the double ii and jj with leading zero using %02d.
You can use the second argument of num2str to format its output, e.g.: num2str(ii,'%02d').
It is a good practice to use string formatting tools when dealing with strings.
It is a better practice in matlab not to use i and j as loop counters, since their default value in matlab is sqrt(-1).
Here is an alternate solution, note that the solution by #Shai is more easily expanded to multiple digits but this one requires less understanding of string formatting.
for i=1:11
for j=1:11
filename=['F',num2str(floor(i/10)),num2str(mod(i,10)) '_', num2str(floor(j/10)),num2str(mod(j,10)),'.mat'];
load(filename);
Func(Fi_j); % run the function for each file Fi_j
end
end
The num2str can do zeropadding to fill the field. In the example below 4 is the desired field width+1.
num2str(1,'% 04.f')
Ans = 001
I've written a script that saves its output to a CSV file for later reference, but the second script for importing the data takes an ungainly amount of time to read it back in.
The data is in the following format:
Item1,val1,val2,val3
Item2,val4,val5,val6,val7
Item3,val8,val9
where the headers are on the left-most column, and the data values take up the remainder of the row. One major difficulty is that the arrays of data values can be different lengths for each test item. I'd save it as a structure, but I need to be able to edit it outside the MATLAB environment, since sometimes I have to delete rows of bad data on a computer that doesn't have MATLAB installed. So really, part one of my question is: Should I save the data in a different format?
Second part of the question:
I've tried importdata, csvread, and dlmread, but I'm not sure which is best, or if there's a better solution. Right now I'm using my own script using a loop and fgetl, which is horribly slow for large files. Any suggestions?
function [data,headers]=csvreader(filename); %V1_1
fid=fopen(filename,'r');
data={};
headers={};
count=1;
while 1
textline=fgetl(fid);
if ~ischar(textline), break, end
nextchar=textline(1);
idx=1;
while nextchar~=','
headers{count}(idx)=textline(1);
idx=idx+1;
textline(1)=[];
nextchar=textline(1);
end
textline(1)=[];
data{count}=str2num(textline);
count=count+1;
end
fclose(fid);
(I know this is probably terribly written code - I'm an engineer, not a programmer, please don't yell at me - any suggestions for improvement would be welcome, though.)
It would probably make the data easier to read if you could pad the file with NaN values when your first script creates it:
Item1,1,2,3,NaN
Item2,4,5,6,7
Item3,8,9,NaN,NaN
or you could even just print empty fields:
Item1,1,2,3,
Item2,4,5,6,7
Item3,8,9,,
Of course, in order to pad properly you would need to know what the maximum number of values across all the items is before hand. With either format above, you could then use one of the standard file reading functions, like TEXTSCAN for example:
>> fid = fopen('uneven_data.txt','rt');
>> C = textscan(fid,'%s %f %f %f %f','Delimiter',',','CollectOutput',1);
>> fclose(fid);
>> C{1}
ans =
'Item1'
'Item2'
'Item3'
>> C{2}
ans =
1 2 3 NaN %# TEXTSCAN sets empty fields to NaN anyway
4 5 6 7
8 9 NaN NaN
Instead of parsing the string textline one character at a time. You could use strtok to break the string up for example
stringParts = {};
tline = fgetl(fid);
if ~ischar(tline), break, end
i=1;
while 1
[stringParts{i},r]=strtok(tline,',');
tline=r;
i=i+1;
if isempty(r), break; end
end
% store the header
headers{count} = stringParts{1};
% convert the data into numbers
for j=2:length(stringParts)
data{count}(j-1) = str2double(stringParts{j});
end
count=count+1;
I've had the same problem with reading csv data in Matlab, and I was surprised by how little support there is for this, but then I just found the import data tool. I'm in r2015b.
On the top bar in the "Home" tab, click on "Import Data" and choose the file you'd like to read. An app window will come up like this:
Import Data tool screenshot
Under "Import Selection" you have the option to "generate function", which gives you quite a bit of customization options, including how to fill empty cells, and what you'd like the output data structure to be. Plus it's written by MathWorks, so it's probably utilizing the fastest available method to read csv files. It was almost instantaneous on my file.
Q1) If you know the max number of columns you can fill empty entries with NaN
Also, if all values are numerical, do you really need "Item#" column? If yes, you can use only "#", so all data is numerical.
Q2) The fastest way to read num. data from a file without mex-files is csvread.
I try to avoid using strings in csv files, but if I have to, I use my csv2cell function:
http://www.mathworks.com/matlabcentral/fileexchange/20135-csv2cell