Find which line of a .dat file MATLAB is working on - matlab

I have a MATLAB script that reads a line from a text file. Each line of the text file contains the filename of a CSV. I need to keep track of what line MATLAB is working on so that I can save the data for that line in a cell array. How can I do that?
To illustrate, the first few lines of my .dat file looks like this:
2006-01-003-0010.mat
2006-01-027-0001.mat
2006-01-033-1002.mat
2006-01-051-0001.mat
2006-01-055-0011.mat
2006-01-069-0004.mat
2006-01-073-0023.mat
2006-01-073-1003.mat
2006-01-073-1005.mat
2006-01-073-1009.mat
2006-01-073-1010.mat
2006-01-073-2006.mat
2006-01-073-5002.mat
2006-01-073-5003.mat
I need to save the variable site_data from each of these .mat files into a different cell of O3_data. Therefore, I need to have a counter so that O3_data{1} is the data from the first line of the text file, O3_data{2} is the data from the second line, etc.
This code works, but it's done without using the counter so I only get the data for one of the files I'm reading in:
year = 2006:2014;
for y = 1:9
flist = fopen(['MDA8_' num2str(year(y)) '_mat.dat']); % Open the list of file names - CSV files of states with data under consideration
nt = 0; % Counter will go up one for each file loaded
while ~feof(flist) % While end of file has not been reached
fname = fgetl(flist);
disp(fname); % Stores name as string in fname
fid = fopen(fname);
while ~feof(fid)
currentLine = fgetl(fid);
load (fname, 'site_data'); % Load current file. It is all the data for one site for one year
O3_data = site_data;
% Do other stuff
end
fclose(fid);
end
fclose(flist);
end
If I add the time index part, MATLAB is telling me that Subscript indices must either be real positive integers or logicals. nt is an integer so I don't know what I'm doing wrong. I need the time index so that I can have O3_data{i} in which each i is one of the files I'm reading in.
year = 2006:2014;
for y = 1:9
flist = fopen(['MDA8_O3_' num2str(year(y)) '_mat.dat']); % Open the list of file names - CSV files of states with data under consideration
nt = 0;
while ~feof(flist) % While end of file has not been reached
fname = fgetl(flist);
fid = fopen(fname);
while ~feof(fid)
currentLine = fgetl(fid);
nt = nt+1; % Time index
load (fname, 'site_data'); % Load current file. It is all the data for one site for one year
O3_data{nt} = site_data;
% Do other stuff
end
fclose(fid);
end
fclose(flist);
end

Try the following:
years = 2006:2014;
for y=1:numel(years)
% read list of filenames for this year (as a cell array of strings)
fid = fopen(sprintf('MDA8_O3_%d_mat.dat',years(y)), 'rt');
fnames = textscan(fid, '%s');
fnames = fnames{1};
fclose(fid);
% load data from each MAT-file
O3_data = cell(numel(fnames),1);
for i=1:numel(fnames)
S = load(fnames{i}, 'site_data');
O3_data{i} = S.site_data;
end
% do something with O3_data cell array ...
end

Try the following - note that since there is an outer for loop, the nt variable will have to be initialized outside of that loop so that we don't overwrite data from previous years (or previous j's). We can avoid the inner while loop since the just read file is a *.mat file and we are using the load command to load its single variable into the workspace.
year = 2006:2014;
nt = 0;
data_03 = {}; % EDIT added this line to initialize to empty cell array
% note also the renaming from 03_data to data_03
for y = 1:9
% Open the list of file names - CSV files of states with data under
% consideration
flist = fopen(['MDA8_O3_' num2str(year(y)) '_mat.dat']);
% make sure that the file identifier is valid
if flist>0
% While end of file has not been reached
while ~feof(flist)
% get the name of the *.mat file
fname = fgetl(flist);
% load the data into a temp structure
data = load(fname,'site_data');
% save the data to the cell array
nt = nt + 1;
data_03{nt} = data.site_data;
end
fclose(flist); % EDIT moved this in to the if statement
end
end
Note that the above assumes that each *.dat file contains a list of *.mat files as illustrated in your above example.
Note the EDITs in the above code from the previous posting.

Related

how to save multiple Cell array values in one .csv file

I have been working on making a database which contains images and their preset values and other important parameters. But unfortunately, I'm not being able to save the initial data of say 10 images in one .csv file. I have made the code that runs fine with creating .csv file but saving the last value and overwriting all the previous values. I gave also once modified that is comment down in the code using sprintf but it make .csv file for every iteration separately. But i want to make one .csv file containing 7 column with all the respective values.
My code is below and output of my code is attached Output.
Please someone guide me how to make single .csv file with 10 values for instance (could be increased to hundreds in final database) to save in 1 .csv file.
clc
clear all
myFolder = 'C:\Users\USER\Desktop\PixROIDirectory\PixelLabelData_1';
filePattern = fullfile(myFolder, '*.png'); % Change to whatever pattern you need
theFiles = dir(filePattern);
load('gTruthPIXDATA.mat','gTruth')
gTruth.LabelDefinitions;
for i=1:10
%gTruth.LabelData{i,1};
baseFileName = theFiles(i).name;
fullFileName = fullfile(myFolder, baseFileName);
fprintf(1, 'Now reading %s\n', fullFileName);
imageArray = imread(fullFileName);
oUt = regionprops(imageArray,'BoundingBox');
Y = floor(oUt.BoundingBox);
X_axis = Y(1);
Y_axis = Y(2);
Width = Y(3);
Height = Y(4);
CLASS = gTruth.LabelDefinitions{1,1};
JPG = gTruth.DataSource.Source{i,1};
PNG = gTruth.LabelData{i,1};
OUTPUT = [JPG X_axis Y_axis Width Height CLASS PNG]
% myFile = sprintf('value%d.csv',i);
% csvwrite(myFile,OUTPUT);
end
Try fprintf (https://www.mathworks.com/help/matlab/ref/fprintf.html).
You will need to open your output file to be written, then you can append lines to it through each iteration
Simple example:
A = [1:10]; % made up a matrix of numbers
fid = fopen('test.csv','w'); % open a blank csv and set as writable
for i = 1:length(A) % loop through the matrix
fprintf(fid,'%i\n',A(i)); % print each integer, then a line break \n
end
fclose(fid); % close the file for writing

Read a value from multiple text files and write all into another file

I have a list of sub-folders and each sub-folder has a text file name called simass.txt. From each of the simass.txt files I extract a c{1}{2,3} cell data (as done in the code below) and write it to a file name features.txt in sequential form in single column.
I am facing a problem where at the end I only have a single value in features.txt, which I believe is due to the values being overwritten. I'm supposed to have 1000 values in features.txt since I have 1000 sub-folders.
What am I doing wrong?
clc; % Clear the command window.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
% Define a starting folder wherever you want
start_path = fullfile(matlabroot, 'D:\Tools\Parameter Generation\');
% Ask user to confirm or change.
topLevelFolder = uigetdir(start_path);
if topLevelFolder == 0
return;
end
% Get list of all subfolders.
allSubFolders = genpath(topLevelFolder);
% Parse into a cell array.
remain = allSubFolders;
listOfFolderNames = {};
while true
[singleSubFolder, remain] = strtok(remain, ';');
if isempty(singleSubFolder)
break;
end
listOfFolderNames = [listOfFolderNames singleSubFolder];
end
numberOfFolders = length(listOfFolderNames)
% Process all text files in those folders.
for k = 1 : numberOfFolders
% Get this folder and print it out.
thisFolder = listOfFolderNames{k};
fprintf('Processing folder %s\n', thisFolder);
% Get filenames of all TXT files.
filePattern = sprintf('%s/simass.txt', thisFolder);
baseFileNames = dir(filePattern);
numberOfFiles = length(baseFileNames);
% Now we have a list of all text files in this folder.
if numberOfFiles >= 1
% Go through all those text files.
for f = 1 : numberOfFiles
fullFileName = fullfile(thisFolder, baseFileNames(f).name);
fileID=fopen(fullFileName);
c=textscan(fileID,'%s%s%s','Headerlines',10,'Collectoutput',true);
fclose(fileID);
%celldisp(c) % display all cell values
cellvalue=c{1}{2,3}
filePh = fopen('features.txt','w');
fprintf(filePh,cellvalue);
fclose(filePh);
fprintf(' Processing text file %s\n', fullFileName);
end
else
fprintf(' Folder %s has no text files in it.\n', thisFolder);
end
end
The problem is in the permission you use in fopen. From the documentation:
'w' - Open or create new file for writing. Discard existing contents, if any.
Which means you're discarding the contents every time, and you end up only having the last value. The fastest fix would be changing the permission to 'a', but I would suggest adding some changes to the code as follows:
Creating cellvalue before the loop, and read c{1}{2,3} into this new vector/cell array.
Perform the writing operation once, after cellvalue is fully populated.
cellvalue = cell(numberOfFiles,1);
for f = 1 : numberOfFiles
...
cellvalue{f} = c{1}{2,3};
end
fopen(...);
fprintf(...);
fclose(...);

Read through and save files with different filenames

I have a list of CSV files. The filenames of these files have been stored in the form [year '_MDA8_mat.dat']. I want to read in each of these files into MATLAB and save the output. How can I write the code so that each year is considered in turn and the output .mat file will be saved for each year?
Here's what I have for reading in one of the files:
flist = fopen('2006_MDA8_mat.dat'); % Open the list of file names - CSV files of states with data under consideration
nt = 0; % Counter will go up one for each file loaded
while ~feof(flist) % While end of file has not been reached
for i = 1:27299 % Number of files
fname = fgetl(flist); % Reads next line of list, which is the name of the next data file
disp(fname); % Stores name as string in fname
nt = nt+1; % Time index
load (fname, 'site_data'); % Load current file. It is all the data for one site for one year
O3_data{i} = site_data;
% Do some more stuff
end
save ('2006_MDA8_1990_2014.mat', '-v7.3')
I tried to write a for loop like this:
year = 2006:2014
for y = 1:9
flist = fopen([year(y) '_MDA8_mat.dat']);
nt = 0; % Counter will go up one for each file loaded
while ~feof(flist) % While end of file has not been reached
for i = 1:1500 % Number of files
% Same as above
end
end
save ([year '_MDA8_1990_2014.mat'], '-v7.3')
end
However, when I run this, it doesn't do the same thing as it did for the one file script. I'm not quite sure where the error occurs, but MATLAB tells me there's an error with feof, which doesn't seem to make sense.
When combining numbers and strings, you need to do a num2str on the number:
[num2str(year) '_MDA8_1990_2014.mat']

Set i = 1: end of file

I am trying to have MATLAB read in a list of files. However, I have several lists of these files and they are all different lengths. How can I set a for loop so that the i goes from 1 to the end of the file? (The line I'm talking about is the 'for i = 1:END OF FILE' part.
To illustrate, the first few lines of my .dat file looks like this:
2006-01-003-0010.mat
2006-01-027-0001.mat
2006-01-033-1002.mat
2006-01-051-0001.mat
2006-01-055-0011.mat
My code looks like this:
for y = 1:9
flist = fopen([num2str(year(y))'_MDA8_mat.dat']); % Open the list of file names - CSV files of states with data under consideration
nt = 0; % Counter will go up one for each file loaded
while ~feof(flist) % While end of file has not been reached
for i = 1:END OF FILE % Number of files CHECK EACH TIME FILE IS MODIFIED
fname = fgetl(flist); % Reads next line of list, which is the name of the next data file
disp(fname); % Stores name as string in fname
nt = nt+1; % Time index
load (fname, 'site_data'); % Load current file. It is all the data for one site for one year
O3_data{i} = site_data;
% Stuff
end
end
Try this code :
for y = 1:9
flist = fopen([num2str(year(y))'_MDA8_mat.dat']); % Open the list of file names - CSV files of states with data under consideration
while ~feof(flist) % While end of file has not been reached
newFileName = fgetl(flist);
fid = fopen(newFileName);
while ~feof(fid)
currentLine = fgetl(fid);
% Do stuff on each sub-file
end
fclose(fid);
end
flcose(flist);
end
There another while check you can do like this :
flist = fopen([num2str(year(y))'_MDA8_mat.dat']);
line = fgetl(flist);
while(ischar(line))
%Do process (like open next file with the line variable)
%read the next line
line = fgetl(flist);
end

Name each variable differently in a loop

I have created a .dat file of file names. I want to read into MATLAB each file in that list and give the data a different name. Currently, each iteration just overwrites the last one.
I found that a lot of people give this answer:
for i=1:10
A{i} = 1:i;
end
However, it isn't working for my problem. Here's what I am doing
flist = fopen('fnames.dat'); % Open the list of file names
nt = 0; % Counter will go up one for each file loaded
while ~feof(flist) % While end of file has not been reached
for i = 1:6 % Number of filenames in the .dat file
% For each file
fname = fgetl(flist); % Reads next line of list, which is the name of the next data file
disp(fname); % Stores name as string in fname
nt = nt+1; % Time index
% Save data
data{i} = read_mixed_csv(fname, '\t'); % Reads in the CSV file% Open file
data{i} = data(2:end,:); % Replace header row
end
end
The code runs with no errors, but only one data variable is saved.
My fnames.dat contains this:
IA_2007_MDA8_O3.csv
IN_2007_MDA8_O3.csv
MI_2007_MDA8_O3.csv
MN_2007_MDA8_O3.csv
OH_2007_MDA8_O3.csv
WI_2007_MDA8_O3.csv
If possible, I would really like to name data something more intuitive. Like IA for the first file, IN for the second and so on. Is there any way to do this?
The last line of the loop is the problem:
data{i} = data(2:end,:);
I don't know what exactly happens I did not run your code, but data(2:end,:) refers to the second to last dataset, not the second to last line.
Try:
thisdata = read_mixed_csv(fname, '\t');
data{i} = thisdata(2:end,:);
If you want to keep track of what data came from which file, save out a second cell array with the names:
thisdata = read_mixed_csv(fname, '\t');
data{i} = thisdata(2:end,:);
names{i} = fname(1:2); % presuming you only need first two letters.
If you need a specific part of the filename that's not always the same length look into strtok or fileparts. Then you can use things like strcmp to check the cell array names for where the data labelled IA or whichever is stored.
As mentioned by #Daniel the simple way to store data of various sizes in a cell array.
data{1} = thisdata(2:end,:)
However, if the names are really important, you could consider using a struct instead. For example:
dataStruct(1).numbers= thisdata(2:end,:);
dataStruct(1).name= theRelevantName
Of course you could also just add them to the cell array:
dataCell{1,1} = thisdata(2:end,:);
dataCell{1,2} = theRelevantName