Read through and save files with different filenames - matlab

I have a list of CSV files. The filenames of these files have been stored in the form [year '_MDA8_mat.dat']. I want to read in each of these files into MATLAB and save the output. How can I write the code so that each year is considered in turn and the output .mat file will be saved for each year?
Here's what I have for reading in one of the files:
flist = fopen('2006_MDA8_mat.dat'); % Open the list of file names - CSV files of states with data under consideration
nt = 0; % Counter will go up one for each file loaded
while ~feof(flist) % While end of file has not been reached
for i = 1:27299 % Number of files
fname = fgetl(flist); % Reads next line of list, which is the name of the next data file
disp(fname); % Stores name as string in fname
nt = nt+1; % Time index
load (fname, 'site_data'); % Load current file. It is all the data for one site for one year
O3_data{i} = site_data;
% Do some more stuff
end
save ('2006_MDA8_1990_2014.mat', '-v7.3')
I tried to write a for loop like this:
year = 2006:2014
for y = 1:9
flist = fopen([year(y) '_MDA8_mat.dat']);
nt = 0; % Counter will go up one for each file loaded
while ~feof(flist) % While end of file has not been reached
for i = 1:1500 % Number of files
% Same as above
end
end
save ([year '_MDA8_1990_2014.mat'], '-v7.3')
end
However, when I run this, it doesn't do the same thing as it did for the one file script. I'm not quite sure where the error occurs, but MATLAB tells me there's an error with feof, which doesn't seem to make sense.

When combining numbers and strings, you need to do a num2str on the number:
[num2str(year) '_MDA8_1990_2014.mat']

Related

How to import and save in Matlab Multiple Text Files creating a Matrix for each files

I have a very large data set which is divided in folders, I have 100 folders with approximately 200 text files each. I have been trying the for loop first of all importing one and then in another command importing the rest. But I am not interested in a dataarray but rather conserving each file with its name as I have to then match the dates among all the files and each file does not have the same amount of columns.
Each text file has is like the one I have attached, where the data I need is from the row 23 until column 13.
The data names are saves as 010010.txt, 010030.txt, 010050.txt ......until 014957.txt , they are not sequential
Apart from this I have created a script for importing one file but I would like to know how to repeat the same script for the rest.
filename = 'C:*\010010.txt';
startRow = 22;
formatSpec = '%4f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', '', 'WhiteSpace', '', 'HeaderLines' ,startRow-1, 'ReturnOnError', false);
fclose(fileID);
Untitled (010010) = [dataArray{1:end-1}];
I would like to repeat the same import process but for the rest files. I would appreciate any suggestion
The text files have the following format:
I only need from row 23 and column 13 and each txt file has different number of rows as some have data from 1992 - 2014 and other have only 2000 - 2014. The first column is the year and column 2 to 13 are months.
I guess you know the basepath under which all your folders are. You can then use something like this:
% First find all folders
folders = cell(0); % empty cell to save folder names
nFolders = 0;
allFolders = ls(basePath); % find all files and folders
for k=1:size(allFolders,1)
curFolder = fullfile(basePath,strtrim(allFolders(k,:)));
if isdir(curFolder) % find out if it is a folder
if ~(allFolders(k,1) == '.') % ignore '.' and '..'
folders{nFolders+1,1} = curFolder; % Save folder path
nFolders = nFolders + 1;
end
end
end
% Then find all files inside these folders
files = cell(0); % empty cell array for file names
nFiles = 0;
for k=1:nFolders % go through all folders
allFiles = ls(folders{k,1});
for l=1:size(allFiles,1) % go through all found files/subfolders
curFile = fullfile(folders{k},strtrim(allFiles(l,:)));
if ~isdir(curFile) % only select files
files{nFiles+1,1} = curFile; % and save it to the cell
nFiles = nFiles + 1;
end
end
end
Now you can iterate through the files cell and read all files according to your script. I see you are interested in the file name. You can extract the file name by
[path,filename,extension] = fileparts(files{k,1});
To import text files, you can use dlmread, which I think is more intuitive than textscan (but has more limitations, of course). For that you don't have to open the file using fopen, you can directly supply the file name.
value = dlmread(fileName,' ',[23,13,23,13]);
The delimiter is now a white space and only the value at row=23 / col=13 is read. Note that the range starts at row/col=0, not 1 like normally in Matlab - so maybe you'll have to change it to [22,12,22,12].

Loop through .txt files in a folder and search for certain characters in the file in MATLAB

How can I loop through .dbl and .txt files in a folder, find .txt files with a specific number and condition (e.g. 'laser on') and then do things to the associated .dbl file (if the names are similar)?
I don't really have much and I can't seem to access the file names I want to look through using getfield(files, 'name') so I'm really stuck. Here is what I have so far so as to give some more structure to my question.
% specify folder with the load function to manipulate .dbl files
folder = 'some folder';
% specify folder that has the data
folder2 = 'some other folder';
cd(folder);
addpath(folder2);
% specify parameters implemented in data collection program
start_delay = 0; % in ps
step_size = 20; % in ps
n_steps = 30;
% loop through folders
files = dir(folder2);
for i = 1:size(files,1)
if i == '*.txt
% find string 'ON'
% find a number in .txt file
% for .txt files with string 'ON', look for .txt files for delay =0, then 20, then 40, etc.
% find associated .dbl file
% manipulate .dbl file
end
end
The things you are looking for are the *.xxx comand for dir and strfind, take a look at this example:
txtList=dir('*.txt');
sblList=dir('*.dbl');
for ii=1:length(txtList)
fid=fopen(txtList(ii).name);
C= textscan(fid,'%s'); %depends on your formating
C=C{1}{1}; %supposing you .txt is really just one string
fclose(fid);
if (sum(strfind(C,'ON'))) %for .txt files with string 'ON'...
Equals0=strfind(C,'0'); % ... look for =0,...
Equals20=strfind(C,'20'); %then =20
%... etc
end
%you get the idea...
%do whatever you want with dblList
end

Find which line of a .dat file MATLAB is working on

I have a MATLAB script that reads a line from a text file. Each line of the text file contains the filename of a CSV. I need to keep track of what line MATLAB is working on so that I can save the data for that line in a cell array. How can I do that?
To illustrate, the first few lines of my .dat file looks like this:
2006-01-003-0010.mat
2006-01-027-0001.mat
2006-01-033-1002.mat
2006-01-051-0001.mat
2006-01-055-0011.mat
2006-01-069-0004.mat
2006-01-073-0023.mat
2006-01-073-1003.mat
2006-01-073-1005.mat
2006-01-073-1009.mat
2006-01-073-1010.mat
2006-01-073-2006.mat
2006-01-073-5002.mat
2006-01-073-5003.mat
I need to save the variable site_data from each of these .mat files into a different cell of O3_data. Therefore, I need to have a counter so that O3_data{1} is the data from the first line of the text file, O3_data{2} is the data from the second line, etc.
This code works, but it's done without using the counter so I only get the data for one of the files I'm reading in:
year = 2006:2014;
for y = 1:9
flist = fopen(['MDA8_' num2str(year(y)) '_mat.dat']); % Open the list of file names - CSV files of states with data under consideration
nt = 0; % Counter will go up one for each file loaded
while ~feof(flist) % While end of file has not been reached
fname = fgetl(flist);
disp(fname); % Stores name as string in fname
fid = fopen(fname);
while ~feof(fid)
currentLine = fgetl(fid);
load (fname, 'site_data'); % Load current file. It is all the data for one site for one year
O3_data = site_data;
% Do other stuff
end
fclose(fid);
end
fclose(flist);
end
If I add the time index part, MATLAB is telling me that Subscript indices must either be real positive integers or logicals. nt is an integer so I don't know what I'm doing wrong. I need the time index so that I can have O3_data{i} in which each i is one of the files I'm reading in.
year = 2006:2014;
for y = 1:9
flist = fopen(['MDA8_O3_' num2str(year(y)) '_mat.dat']); % Open the list of file names - CSV files of states with data under consideration
nt = 0;
while ~feof(flist) % While end of file has not been reached
fname = fgetl(flist);
fid = fopen(fname);
while ~feof(fid)
currentLine = fgetl(fid);
nt = nt+1; % Time index
load (fname, 'site_data'); % Load current file. It is all the data for one site for one year
O3_data{nt} = site_data;
% Do other stuff
end
fclose(fid);
end
fclose(flist);
end
Try the following:
years = 2006:2014;
for y=1:numel(years)
% read list of filenames for this year (as a cell array of strings)
fid = fopen(sprintf('MDA8_O3_%d_mat.dat',years(y)), 'rt');
fnames = textscan(fid, '%s');
fnames = fnames{1};
fclose(fid);
% load data from each MAT-file
O3_data = cell(numel(fnames),1);
for i=1:numel(fnames)
S = load(fnames{i}, 'site_data');
O3_data{i} = S.site_data;
end
% do something with O3_data cell array ...
end
Try the following - note that since there is an outer for loop, the nt variable will have to be initialized outside of that loop so that we don't overwrite data from previous years (or previous j's). We can avoid the inner while loop since the just read file is a *.mat file and we are using the load command to load its single variable into the workspace.
year = 2006:2014;
nt = 0;
data_03 = {}; % EDIT added this line to initialize to empty cell array
% note also the renaming from 03_data to data_03
for y = 1:9
% Open the list of file names - CSV files of states with data under
% consideration
flist = fopen(['MDA8_O3_' num2str(year(y)) '_mat.dat']);
% make sure that the file identifier is valid
if flist>0
% While end of file has not been reached
while ~feof(flist)
% get the name of the *.mat file
fname = fgetl(flist);
% load the data into a temp structure
data = load(fname,'site_data');
% save the data to the cell array
nt = nt + 1;
data_03{nt} = data.site_data;
end
fclose(flist); % EDIT moved this in to the if statement
end
end
Note that the above assumes that each *.dat file contains a list of *.mat files as illustrated in your above example.
Note the EDITs in the above code from the previous posting.

Set i = 1: end of file

I am trying to have MATLAB read in a list of files. However, I have several lists of these files and they are all different lengths. How can I set a for loop so that the i goes from 1 to the end of the file? (The line I'm talking about is the 'for i = 1:END OF FILE' part.
To illustrate, the first few lines of my .dat file looks like this:
2006-01-003-0010.mat
2006-01-027-0001.mat
2006-01-033-1002.mat
2006-01-051-0001.mat
2006-01-055-0011.mat
My code looks like this:
for y = 1:9
flist = fopen([num2str(year(y))'_MDA8_mat.dat']); % Open the list of file names - CSV files of states with data under consideration
nt = 0; % Counter will go up one for each file loaded
while ~feof(flist) % While end of file has not been reached
for i = 1:END OF FILE % Number of files CHECK EACH TIME FILE IS MODIFIED
fname = fgetl(flist); % Reads next line of list, which is the name of the next data file
disp(fname); % Stores name as string in fname
nt = nt+1; % Time index
load (fname, 'site_data'); % Load current file. It is all the data for one site for one year
O3_data{i} = site_data;
% Stuff
end
end
Try this code :
for y = 1:9
flist = fopen([num2str(year(y))'_MDA8_mat.dat']); % Open the list of file names - CSV files of states with data under consideration
while ~feof(flist) % While end of file has not been reached
newFileName = fgetl(flist);
fid = fopen(newFileName);
while ~feof(fid)
currentLine = fgetl(fid);
% Do stuff on each sub-file
end
fclose(fid);
end
flcose(flist);
end
There another while check you can do like this :
flist = fopen([num2str(year(y))'_MDA8_mat.dat']);
line = fgetl(flist);
while(ischar(line))
%Do process (like open next file with the line variable)
%read the next line
line = fgetl(flist);
end

Name each variable differently in a loop

I have created a .dat file of file names. I want to read into MATLAB each file in that list and give the data a different name. Currently, each iteration just overwrites the last one.
I found that a lot of people give this answer:
for i=1:10
A{i} = 1:i;
end
However, it isn't working for my problem. Here's what I am doing
flist = fopen('fnames.dat'); % Open the list of file names
nt = 0; % Counter will go up one for each file loaded
while ~feof(flist) % While end of file has not been reached
for i = 1:6 % Number of filenames in the .dat file
% For each file
fname = fgetl(flist); % Reads next line of list, which is the name of the next data file
disp(fname); % Stores name as string in fname
nt = nt+1; % Time index
% Save data
data{i} = read_mixed_csv(fname, '\t'); % Reads in the CSV file% Open file
data{i} = data(2:end,:); % Replace header row
end
end
The code runs with no errors, but only one data variable is saved.
My fnames.dat contains this:
IA_2007_MDA8_O3.csv
IN_2007_MDA8_O3.csv
MI_2007_MDA8_O3.csv
MN_2007_MDA8_O3.csv
OH_2007_MDA8_O3.csv
WI_2007_MDA8_O3.csv
If possible, I would really like to name data something more intuitive. Like IA for the first file, IN for the second and so on. Is there any way to do this?
The last line of the loop is the problem:
data{i} = data(2:end,:);
I don't know what exactly happens I did not run your code, but data(2:end,:) refers to the second to last dataset, not the second to last line.
Try:
thisdata = read_mixed_csv(fname, '\t');
data{i} = thisdata(2:end,:);
If you want to keep track of what data came from which file, save out a second cell array with the names:
thisdata = read_mixed_csv(fname, '\t');
data{i} = thisdata(2:end,:);
names{i} = fname(1:2); % presuming you only need first two letters.
If you need a specific part of the filename that's not always the same length look into strtok or fileparts. Then you can use things like strcmp to check the cell array names for where the data labelled IA or whichever is stored.
As mentioned by #Daniel the simple way to store data of various sizes in a cell array.
data{1} = thisdata(2:end,:)
However, if the names are really important, you could consider using a struct instead. For example:
dataStruct(1).numbers= thisdata(2:end,:);
dataStruct(1).name= theRelevantName
Of course you could also just add them to the cell array:
dataCell{1,1} = thisdata(2:end,:);
dataCell{1,2} = theRelevantName