Ignoring certain number of rows while reading text file - matlab

I want to read a text file in matlab, but when I read I want to ignore a certain number of headers, the number shouldn't be fixed. Then I want to start reading from the last non-deleted row to a certain number of line again. So for example, I may want to ignore the first 7 rows and starting from the 8th row to the next 100 rows.
How can I do that easily?
Thanks!

Assume you have a text file data.txt with N_header number of headerlines containing 5 integers per row and you want to read N_lines from this file.
First create a link to the file so MatLab knows what file you need:
FID = fopen('data.txt') % Create a file id
Now you can use textscan to read N_lines lines and skipping N_header headerlines:
N_header = 7;
N_lines = 100;
formatSpec = '%d %d %d %d %d'; % Five integers per row seperated by whitespace
C = textscan(FID,formatSpec,N_lines,'HeaderLines',N_header);
fclose(FID)
The columns in your text file are stored in C{column number}. If you want to have each line stored in C use:
formatSpec = '%s'; % The whole string, i.e. each line
C = textscan(FID,formatSpec,N_lines,'delimiter','\n','HeaderLines',N_header); % Up to the line end '\n'
Which stores every line in cell array C.

Using the function which reads line by line:
http://www.mathworks.com/help/matlab/ref/fgetl.html
http://www.mathworks.com/help/matlab/ref/fgets.html
If you read it in a loop, whenever you reach unwanted line, just skip it using continue.

Related

Reading list of values from text file into MATLAB

I have a text file containing two column data, separated by a comma. However, the first 10 lines contain information which I do not need.
The input text file looks as follows:
# PROGRAM NAME
# The first 10 lines are info I don't need
#
#
#
#
892
5
564
1, 0.4377E-014
2, 0.0673E+000
...
I am trying to write a code which reads the value pairs starting on Line 11 into a 2 column matrix.
My (failed) attempt thus far is as follows:
fin = fopen(fullfile(cd, file_name), 'r');
tLine = fgets(fin);
while ischar(tLine)
crit_list = [crit_list; tLine(:)];
end
My intention was to delete the first 10 lines of the matrix after the code had executed, and then use str2num on the value pairs, but I'm not sure this would be very efficient.
How can I read this file into MATLAB, starting from the 11th line?
importdata has the ability to skip header lines:
importdata(file_name,delimiter,10); % skip 10 header lines
where you have to specify your delimeter, judging the file you'll want delimiter = ',', i.e. a comma.

I want matlab to read a comma separated txt file with 100s of columns

On http://www.mathworks.com/help/matlab/ref/textscan.html, I can see the suggestion:
fileID = fopen('data3.csv');
C = textscan(fileID,'%f %f %f %f','Delimiter',',',...
'MultipleDelimsAsOne',1);
fclose(fileID);
celldisp(C)
Not sure if textscan can also .txt but I can't really write out 100s of '%f's. Is there a way to do this by giving textscan the dimensions of the mtx in my .txt file? Thanks.
If you have a file that is only numbers, and the text is comma separated (.csv), then you can use csvread:
num_headerlines = 1
C = csvread('C:\users\smith\Documents\data3.csv', num_headerlines, 0)
The last two arguments here are the row and column to begin reading, and unlike most everything else in MATLAB, they are 0-indexed, so if you want to start on the first column, you pass a 0, and if you want to start on the second row, you pass a 1. This will read as many columns as you have, without needed a long format specifier.

create a huge text file in matlab with specific numbers?

How can I create a huge text file in matlab with millions of lines which contanin numbers 9876543210 and 5 emptyspace and again repeated numbers and 5 spaces for milions of lines?
myNumbers= (0:9);
fid = fopen('lastFile.txt','wt');
for i=1:99
fprintf(fid,'%d%d%d%d%d%d%d%d%d%d \r',myNumbers);
end
fclose(fid);
Loopy way:
myNumbers= (0:9);
repeats = 20; %// number of times to repeat array on each line
lines = 99; %// number of lines to write to file
fid = fopen('lastFile.txt','wt');
for n=1:lines
for r=1:repeats
fprintf(fid,'%d%d%d%d%d%d%d%d%d%d ',myNumbers); %// no return just yet
end
fprintf(fid, '\n'); %// or \r if that's what you really want
end
fclose(fid);
I changed the \r to \n because it made more sense to me. Unless you're using Mac OS 9. I also changed the one space at the end of the format specification to 5 spaces, since that's what you said you wanted.
To get the array to repeat on one line, you just have to make sure you don't add the newline until you've got everything you want on that line. Then do that for however many lines you want.
There are other ways to do this, but this is the most straightforward.

Is there a way to only read part of a line of an input file?

I have a routine that opens a lookup table file to see if a certain entry already exists before writing to the file. Each line contains about 2,500 columns of data. I need to check the first 2 columns of each line to make sure the entry doesn't exist.
I don't want to have to read in 2,500 columns for every line just to check 2 entries. I was attempting to use the fscanf function, but it gives me an Invalid size error when I attempt to only read 2 columns. Is there a way to only read part of each line of an input file?
if(exist(strcat(fileDirectory,fileName),'file'))
fileID = fopen(strcat(fileDirectory,fileName),'r');
if(fileID == -1)
disp('ERROR: Could not open file.\n')
end
% Read file to see if line already exists
dataCheck = fscanf(fileID, '%f %f', [inf 2]);
for i=1:length(dataCheck(:,1))
if(dataCheck(i,1) == sawAnglesDeg(sawCount))
if(dataCheck(i,2) == sarjAnglesDeg(floor((sawCount-1)/4)+1))
% This line has already been written in lookup table
lineExists = true;
disp('Duplicate lookup table line found. Skipping...\n')
break;
end
end
end
fclose(fileID);
end
Well, not really.
You should be able in a loop to do an fscanf of the first two doubles, followed by a fgetl to read the rest of the line, i.e. on the form:
while there_are_more_lines
dataCheck = fscanf(fileID, '%f', 2);
fgetl(fileID); % Read remainder of line, discarding it
% Do check here for each line
end
Since it is a text file, you can not really skip reading characters from the file. For binary files you can do an fseek, which can jump round in the file based on a byte-count - it can be used if you know exactly where the next line starts (in byte-count). But for a text file you do not know that, since each line will vary in length. If you save the data in a binary file instead, it would be possible to do something like that.
What I would probably do: Create two files, the first one containing the two "check-values", that could be read in quickly, and the other one containing the 2500 columns of data, with or without the two "check-values". They should be updated synchronously; when adding one line to the first file, also one line is added to the second file.
And I would definitely make a checkData matrix variable and keep that in memory as long as possible; when adding a new line to the file, also update the checkData matrix, so you should only need to read the file once initially and use the checkData matrix for the rest of the life of your program.
With textscan you can skip fields, parts of fields, or even "rest of line", so I would do this (based on MATLAB help example slightly modified):
fileID = fopen('data.dat');
data = textscan(fileID,'%f %f %*[^\n]');
fclose(fileID);
Then check data (should be the two columns you want) to see if any of those rows matches the requirements.
As #Jesper Grooss wrote, there is no solution to skip the remaining of a line without reading it. In a single text file context, a fastest solution would probably consist of
reading the entire file with textscan (one line of text into one cell element of a matrix)
appending the new line to the matrix even if it is a duplicate entry
uniquing the cell matrix with unique(cellmatrix, 'rows')
appending the new line to the text file if it corresponds to a new entry
The uniquing step replaces the putatively costly for loop.

Confused with .tsv files in MATLAB (converting to a Matrix?)

I have a .tsv file that I wish to open in MATLAB, however I am having several problems with this.
I have tried the following
fid = fopen('data.tsv');
C = textscan(fid, ['%s' repmat('%f',1,8)], 'HeaderLines', 1);
fclose(fid);
and got some weird values that had nothing to do with my file. I also tried:
data = dlmread('data.tsv', '\t');
and got this
Error using dlmread (line 139)
Mismatch between file and format string.
Trouble reading number from file (row 1u, field 1u) ==> Participant Assessment
Experiment Block Trial
Answer Reaction Timestamp Free Response\n
Is there some way I can get it to ignore the header, or am I doing it totally wrong?
With dlmread you can specify where to start reading in the file. This is one of the few times that MATLAB indexing begins at 0 - [0,0] is the first row, first column. Therefore, to ignore the first row (containing your header):
data = dlmread('data.tsv','\t', 1, 0);
This will only work if all the values (other than the header lines you skip) are numeric.
Your example with textscan also looks fine to me (provided that the format supplied is correct and there is indeed only one header line). C will be a cell array; to obtain the data from each column use C{n} where n is the column number.
Rather than skipping the header line, it's sometimes useful to just read it in to a separate value:
fid = fopen('data.tsv');
C_header = textscan(fid, '%s',9);
C = textscan(fid, ['%s' repmat('%f',1,8)]);
fclose(fid);