How to read multiple .dat files and take an average? - matlab

I have 100 .dat files in a folder. Is it possible to read all the files at once with MATLAB and do average of the 5th column of those 100 files? Here is a sample of one of the .dat files.

Here is some code to get you started:
%# get list of 100 .dat files
pathToFolder = '.';
files = dir( fullfile(pathToFolder,'*.dat') );
%# read all files
data = cell(numel(files),1);
for i=1:numel(files)
fid = fopen(fullfile(pathToFolder,files(i).name), 'rt');
H = textscan(fid, '%s', 4, 'Delimiter','\n');
C = textscan(fid, repmat('%f ',1,8), 'Delimiter',' ', ...
'MultipleDelimsAsOne',true, 'CollectOutput',true);
fclose(fid);
H = H{1}; C = C{1};
%# store numeric data and ignore the header lines
data{i} = C;
end
%# we assume all tables have the same size
data = cat(3,data{:});
mn = mean(data(:,5,:),3) %# mean of 5th col across 100 files

Take a look # this http://www.mathworks.com/matlabcentral/newsreader/view_thread/161967
youre entire question is answered here. And youre answer is an FAQ # matlab
http://matlab.wikia.com/wiki/FAQ
Good luck!

Related

read multiple files in from a directoryusing matlab

How can read multiple files in from a directory using matlab? Can someone please help correct my code below:
files =dir(fullfile(directory_path,'*.dat'));
numfiles = length('*.dat');
mydat = cell(1, numfiles);
for k = 1:numfiles
mydata{k} = fopen([directory_path,files(k).name]);
values=textscan(mydata{k},'%s','delimiter','\n');
fclose(fid);
%fprintf(values)
....do something with values.....
end
.dat files are just many rows and single column of strings that need to be read in a loop and processed further.
Thanks
fopen gives file pointer, which you save to mydata{k}, and try to release by fclose(fid). There is no fid, so it doesn't work.
What you should do is replace mydata{k} with fid. And probably values by mydata{k}.
The other bug is in numfiles = .... You will always have numfiles = 5, as there are 5 characters in the '*.dat'.
numfiles = length(files);
would be better, although you would also count directories. Check one of the other questions how to solve this.
Thanks Zizy Archer.
I solved the problem this way:
files =dir(fullfile(directory_path,'*.dat'));
numfiles = length(files);
for k = 1:numfiles
textFileName = [directory_path,files(k).name]
fid = fopen(textFileName, 'r');
textData = textscan(fid,'%s','delimiter','\n');
fclose(fid);
data = textData{:,1}
end

Loop through text files, replace consecutive asterisks with 0.00

all,
I am writing a matlab program to read in text data and rearrange it. Now I am meeting with a new problem.
When I am writing data out to csv file, there are randomly missing data noted as ******, as shown below causing my program to terminate.
2055 6 17 24.2 29.57 7.02****** 0.99 2.65 2.73 4.09 0.11
Any one can help me with a small program to loop through all the text files in the folder, and replace the consecutive stars, with 0.00? The stars are always in columns 33 to 38, occupying 6 spaces. I want it to be changed to be two spaces followed by 0.00.
Thanks,
James
For a given text file, you can read it into memory, replace the asterisks with the desired text, and then overwrite the original text file:
filename = 'blah.txt'
% Read it into memory
fid = fopen(filename, 'r');
scanned_fields = textscan(fid, '%s', 'Delimiter','\n');
fclose(fid);
% The first (and only) field of textscan will be our cell array of text
lines = scanned_fields{1};
% Replace the asterisks with the desired text
lines = strrep(lines, '******', ' 0.00');
% Overwrite the original file
fid = fopen(filename, 'w');
fprintf(fid, '%s\n', lines{:});
fclose(fid);
To do this for all of the text files in your directory, you can use dir to get a list of files in your current directory that end in ".txt":
files = dir('*.m');
filenames = {files.name};
And then loop over the files:
for ii = 1:length(filenames)
filename = filenames{ii};
% Read it into memory
fid = fopen(filename, 'r');
scanned_fields = textscan(fid, '%s', 'Delimiter','\n');
fclose(fid);
lines = scanned_fields{1};
% Replace the asterisks with the desired text
lines = strrep(lines, '******', ' 0.00');
% Overwrite the original file
fid = fopen(filename, 'w');
fprintf(fid, '%s\n', lines{:});
fclose(fid);
% Go on to the next file
end
And of course, I would recommend creating a backup copy of this directory before running this, just in case something unexpected comes up.

Text Scanning to read in unknown number of variables and unknown number of runs

I am trying to read in a csv file which will have the format
Var1 Val1A Val1B ... Val1Q
Var2 Val2A Val2B ... Val2Q
...
And I will not know ahead of time how many variables (rows) or how many runs (columns) will be in the file.
I have been trying to get text scan to work but no matter what I try I cannot get either all the variable names isolated or a rows by columns cell array. This is what I've been trying.
fID = fopen(strcat(pwd,'/',inputFile),'rt');
if fID == -1
disp('Could not find file')
return
end
vars = textscan(fID, '%s,%*s','delimiter','\n');
fclose(fID);
Does anyone have a suggestion?
If the file has the same number of columns in each row (you just don't know how many to begin with), try the following.
First, figure out how many columns by parsing just the first row and find the number of columns, then parse the full file:
% Open the file, get the first line
fid = fopen('myfile.txt');
line = fgetl(fid);
fclose(fid);
tmp = textscan(line, '%s');
% The length of tmp will tell you how many lines
n = length(tmp);
% Now scan the file
fid = fopen('myfile.txt');
tmp = textscan(fid, repmat('%s ', [1, n]));
fclose(fid);
For any given file, are all the lines equal length? If they are, you could start by reading in the first line and use that to count the number of fields and then use textscan to read in the file.
fID = fopen(strcat(pwd,'/',inputFile),'rt');
firstLine = fgetl(fID);
numFields = length(strfind(firstLine,' ')) + 1;
fclose(fID);
formatString = repmat('%s',1,numFields);
fID = fopen(strcat(pwd,'/',inputFile),'rt');
vars = textscan(fID, formatString,' ');
fclose(fID);
Now you will have a cell array where first entry are the var names and all the other entries are the observations.
In this case I assumed the delimiter was space even though you said it was a csv file. If it is really commas, you can change the code accordingly.

reading all the text files in a folder in same order they appear into Matlab

I am currently having 20 text files naming start from file1 to file20. I am reading them into matlab using
filePattern = fullfile(myFolder, '*.txt');
dataFiles = dir(filePattern);
for k = 1:length(dataFiles)
baseFileName = dataFiles(k).name;
fullFileName = fullfile(myFolder, baseFileName);
fid = fopen(fullFileName, 'r');
line = fgetl( fid );
while ischar( line )
tks = regexp( line, '\[([^,]+),([^\]]+)\]([^\[]+)\[([^\]]+)\]([^\[]+)', 'tokens' );
for ii = 1:numel(tks)
j=j+1;
mat( j ,: ) = str2double( tks{ii} );
end
line = fgetl( fid );
end
fclose( fid );
end
It is working perfectly, but I need to retain the same order the text files appear in the folder. The data from file1 next file2 next file3 till file20 into Matlab.
But it is rearranging into file1 file10 file11 file12 ... file2 file20 and reading. dataFiles is a structure and in that the files are loaded alphabetically. How to prevent that?
I'd recommend using sort_nat (available on Matlab Central) for this task.
Run this in an empty folder:
% create sample files
for i = 1:20
filename = sprintf('file%d.txt',i);
fclose(fopen(filename, 'w'));
end
% obtain folder contents
files = dir('*.txt');
%{files.name} % -> list of files might be in alphabetical order (depends on OS)
% sort_nat sorts strings containing digits in a way such that the numerical value
% of the digits is taken into account
[~,order] = sort_nat({files.name});
files = files(order);
% check output is in numerical order
{files.name}

Reading CSV with mixed type data

I need to read the following csv file in MATLAB:
2009-04-29 01:01:42.000;16271.1;16271.1
2009-04-29 02:01:42.000;2.5;16273.6
2009-04-29 03:01:42.000;2.599609;16276.2
2009-04-29 04:01:42.000;2.5;16278.7
...
I'd like to have three columns:
timestamp;value1;value2
I tried the approaches described here:
Reading date and time from CSV file in MATLAB
modified as:
filename = 'prova.csv';
fid = fopen(filename, 'rt');
a = textscan(fid, '%s %f %f', ...
'Delimiter',';', 'CollectOutput',1);
fclose(fid);
But it returs a 1x2 cell, whose first element is a{1}='ÿþ2', the other are empty.
I had also tried to adapt to my case the answers to these questions:
importing data with time in MATLAB
Read data files with specific format in matlab and convert date to matal serial time
but I didn't succeed.
How can I import that csv file?
EDIT After the answer of #macduff i try to copy-paste in a new file the data reported above and use:
a = textscan(fid, '%s %f %f','Delimiter',';');
and it works.
Unfortunately that didn't solve the problem because I have to process csv files generated automatically, which seems to be the cause of the strange MATLAB behavior.
What about trying:
a = textscan(fid, '%s %f %f','Delimiter',';');
For me I get:
a =
{4x1 cell} [4x1 double] [4x1 double]
So each element of a corresponds to a column in your csv file. Is this what you need?
Thanks!
Seems you're going about it the right way. The example you provide poses no problems here, I get the output you desire. What's in the 1x2 cell?
If I were you I'd try again with a smaller subset of the file, say 10 lines, and see if the output changes. If yes, then try 100 lines, etc., until you find where the 4x1 cell + 4x2 array breaks down into the 1x2 cell. It might be that there's an empty line or a single empty field or whatever, which forces textscan to collect data in an additional level of cells.
Note that 'CollectOutput',1 will collect the last two columns into a single array, so you'll end up with 1 cell array of 4x1 containing strings, and 1 array of 4x2 containing doubles. Is that indeed what you want? Otherwise, see #macduff's post.
I've had to parse large files like this, and I found I didn't like textscan for this job. I just use a basic while loop to parse the file, and I use datevec to extract the timestamp components into a 6-element time vector.
%% Optional: initialize for speed if you have large files
n = 1000 %% <# of rows in file - if known>
timestamp = zeros(n,6);
value1 = zeros(n,1);
value2 = zeros(n,1);
fid = fopen(fname, 'rt');
if fid < 0
error('Error opening file %s\n', fname); % exit point
end
cntr = 0
while true
tline = fgetl(fid); %% get one line
if ~ischar(tline), break; end; % break out of loop at end of file
cntr = cntr + 1;
splitLine = strsplit(tline, ';'); %% split the line on ; delimiters
timestamp(cntr,:) = datevec(splitLine{1}, 'yyyy-mm-dd HH:MM:SS.FFF'); %% using datevec to parse time gives you a standard timestamp vector
value1(cntr) = splitLine{2};
value2(cntr) = splitLine{3};
end
%% Concatenate at the end if you like
result = [timestamp value1 value2];