How to pass multiple comment style to skip the header of a text file? - matlab

I am trying to read hundreds of .dat file by skipping header lines (I do not know how many of them I need to skip beforehand). Header lines very from 1 to 20 and have at beginning either or "$" oder "!".
A sample data (left column - node, right column - microstructure) has always two columns and looks like the following:
!===
!Comment
$Material
1 1.452E-001
2 1.446E-001
3 1.459E-001
I tried the following codeline, assuming I know beforehand that there 3 lines in header:
fid = fopen('Graphite_Node_Test.dat') ;
data = textscan(fid,'%f %f','HeaderLines',3) ;
fclose(fid);
This solution works if the number of header lines is known. How can I change the code so that it can read the .dat file without knowing the number of header lines beginning with either "$" or "!" sign?

Related

MATLAB fwrite\fread issue: two variables are being concatenated

I am reading in a binary EDF file and I have to split it into multiple smaller EDF files at specific points and then adjust some of the values inside. Overall it works quite well but when I read in the file it combines 2 character arrays with each other. Obviously everything afterwords gets corrupted as well. I am at a dead end and have no idea what I'm doing wrong.
The part of the code (writing) that has to contain the problem:
byt=fread(fid,8,'*char');
fwrite(tfid,byt,'*char');
fwrite(tfid,fread(fid,44));
%new number of records
s = records;
fwrite(tfid,s,'*char');
fseek(fid,8,0);
%test
fwrite(tfid,fread(fid,8,'*char'),'*char');
When I use the reader it combines the records (fwrite(tfid,s,'*char'))
with the value of the next variable. All variables before this are displayed correctly. The relevant code of the reader:
hdr.bytes = str2double(fread(fid,8,'*char')');
reserved = fread(fid,44);%#ok
hdr.records = str2double(fread(fid,8,'*char')');
if hdr.records == -1
beep
disp('There appears to be a problem with this file; it returns an out-of-spec value of -1 for ''numberOfRecords.''')
disp('Attempting to read the file with ''edfReadUntilDone'' instead....');
[hdr, record] = edfreadUntilDone(fname, varargin);
return
end
hdr.duration = str2double(fread(fid,8,'*char')');
The likely problem is that your character array s does not have 8 characters in it, but you expect there to be 8 when you read it from the file. Whatever the number of characters in the array is, that's how many values fwrite will write out to the file. Anything less than 8 characters and you'll end up reading part of the next piece of data when you read from the file.
One fix would be to pad s with blanks before writing it:
s = [blanks(8-numel(records)) records];
In addition, the syntax '*char' is only valid when using fread: the * indicates that the output class should be 'char' as well. It's unnecessary when using fwrite.

How to import column of numbers from mixed string numerical text file

Variations of this question have already been asked several times, for example here. However, I can't seem to get this to work for my data.
I have a text file with 3 columns. First and third columns are floating point numbers. Middle column is strings. I'm only interested in getting the first column really.
Here's what I tried:
filename=fopen('heartbeatn1nn.txt');
A = textscan(filename,'%f','HeaderLines',0);
fclose(filename);
When I do this A comes out as just a single number--the first element in the column. How do I get the whole column? I've also tried this with the '.tsv' file extension, same result.
Also tried:
filename=fopen('heartbeatn1nn.txt');
formatSpec='%f';sizeA=[1 Inf];
A = fscanf(filename,formatSpec,sizeA);
fclose(filename);
with same result.
Could the file size be a problem? Not sure how many rows but probably quite a few since file size is 1.7M.
Assuming the columns in your text file are separated by single whitespace characters your format specification should look like this:
A = textscan(filename,'%f %s %f');
A now contains the complete file content. To obtain the first column:
first_col = A{:,1};
Alternatively, you can tell textscan to skip the unneeded fields with the * option:
first_col = cell2mat( textscan(filename, '%f %*s %*f') );
This returns only the first column.

How to avoid the repeated paragraghs of long txt files being ignored for importdata in matlab

I am trying to import all double from a txt file, which has this form
#25x1 string
#9999x2 double
.
.
.
#(repeat ten times)
However, when I am trying to use import Wizard, only the first
25x1 string
9999x2 double.
was successfully loaded, the other 9 were simply ignored
How may I import all the data? (Does importdata has a maximum length or something?)
Thanks
It's nothing to do with maximum length, importdata is just not set up for the sort of data file you describe. From the help file:
For ASCII files and spreadsheets, importdata expects
to find numeric data in a rectangular form (that is, like a matrix).
Text headers can appear above or to the left of the numeric data,
as follows:
Column headers or file description text at the top of the file, above
the numeric data. Row headers to the left of the numeric data.
So what is happening is that the first section of your file, which does match the format importdata expects, is being read, and the rest ignored. Instead of importdata, you'll need to use textscan, in particular, this style:
C = textscan(fileID,formatSpec,N)
fileID is returned from fopen. formatspec tells textscan what to expect, and N how many times to repeat it. As long as fileID remains open, repeated calls to textscan continue to read the file from wherever the last read action stopped - rather than going back to the start of the file. So we can do this:
fileID = fopen('myfile.txt');
repeats = 10;
for n = 1:repeats
% read one string, 25 times
C{n,1} = textscan(fileID,'%s',25);
% read two floats, 9999 times
C{n,2} = textscan(fileID,'%f %f',9999);
end
You can then extract your numerical data out of the cell array (if you need it in one block you may want to try using 'CollectOutput',1 as an option).

Read and parse more than one text file matlab

I have four .txt files. Each one has 250 lines, where each line has 4 values separated by commas as shown below are the first 5 lines in one of the file, but all are of the same structure:
NaN,NaN,NaN,-1
792.98,419.48,333.35,245.63
787.13,408.59,345.05,251.48
798.3,414.17,333.36,245.63
803.61,414.43,333.35,239.78
One of the four files is the reference file, named groundtruth.txt I want to read each line from the three files and compare it with the values found in the same line number in the groudtruth.txt file. And after that save the difference between the values of the ground_truth and each one in a file for further processing, so the result will be that I'll have 3 new different files holding the differences where each file will have 250 lines and each line holds the difference such as the first line of the result file having the difference between the ground_truth and the first file will be like this :79.8,9.42,22.35,10.63
So if anyone could please advise.
If I understand correctly, this should be the thing you are after:
groundtruth = dlmread('groundtruth.txt');
file1 = dlmread('file_01.txt');
file2 = dlmread('file_02.txt');
file3 = dlmread('file_03.txt');
dlmwrite('diff_01.txt', file1 - groundtruth);
dlmwrite('diff_02.txt', file2 - groundtruth);
dlmwrite('diff_03.txt', file3 - groundtruth);

How to get the number of columns of a csv file?

I have a huge csv file that I want to load with matlab. However, I'm only interested in specific columns that I know the name.
As a first step, I would like to just check how many columns the csv file has. How can I do that with matlab?
As Jonesy and erelender suggest, I would think this will do it:
fid=fopen(filename);
tline = fgetl(fid);
fclose(fid);
length(find(tline==','))+1
Since you don't seem to know what kind of carriage return character (or character encoding?) is being used then I would suggest progressively sampling your file until you encounter a recognizable CR character. One way to do this is to loop over something like
A = fscanf(fileID, ['%' num2str(N) 'c'], sizeA);
where N is the number of characters to read. At each iteration test A for presence of carriage return characters, stop if one is encountered. Once you know where the carriage return is just repeat with the right N and perform the length(find...) operation, or alternately accumulate the number of commas at each iteration. You may want to check that your file is being read along rows (is it always?), check a few samples to make sure it is.
1-) Read the first line of file
2-) Count the number of commas, or seperator characters if it is not comma
3-) Add 1 to the count and the result is the number of columns in the file.
If the csv has only numeric value you can use:
M=csvread('file_name.csv');
[row,col]=size(M);