I have a text file containing two column data, separated by a comma. However, the first 10 lines contain information which I do not need.
The input text file looks as follows:
# PROGRAM NAME
# The first 10 lines are info I don't need
#
#
#
#
892
5
564
1, 0.4377E-014
2, 0.0673E+000
...
I am trying to write a code which reads the value pairs starting on Line 11 into a 2 column matrix.
My (failed) attempt thus far is as follows:
fin = fopen(fullfile(cd, file_name), 'r');
tLine = fgets(fin);
while ischar(tLine)
crit_list = [crit_list; tLine(:)];
end
My intention was to delete the first 10 lines of the matrix after the code had executed, and then use str2num on the value pairs, but I'm not sure this would be very efficient.
How can I read this file into MATLAB, starting from the 11th line?
importdata has the ability to skip header lines:
importdata(file_name,delimiter,10); % skip 10 header lines
where you have to specify your delimeter, judging the file you'll want delimiter = ',', i.e. a comma.
Related
I have a text file with multiple sections of observations. Each time, when the new observation starts, the file have some information for the data following (like header of a file).
When I used textscan, I could only able to read the first section only. For example, the data is arranged as follows:
1993-01-31 17:00:00.000 031 -61.00
1993-01-31 18:00:00.000 031 -55.00
1993-01-31 19:00:00.000 031 -65.00
Format
Source of Data
Station Name
Data Interval Type 1-hour
Data Type Final
1993-02-01 00:00:00.000 032 -83.00
1993-02-01 01:00:00.000 032 -70.00
1993-02-01 02:00:00.000 032 -64.00
From above, I only want to read data lines starting with '1993' by ignoring the block of text in the middle.
As you noticed, textscan stops reading when it can't parse the input anymore. You can actually use this to your advantage. For example, in your case, you know that there are 5 lines of garbage between every "good" dataset. So we can run textscan once to get the first set, then run it successive times (with Headerlines set to 5 to ignore those 5 lines) to get each of the "good" datasets in the file. Then concatenate all of the data.
This works because when you use textscan with a file identifier, it does not rewind the file identifier back to the beginning of the file after it returns. It leaves it right where it stopped being able to parse it. Therefore, the next call to textscan starts right where you left off (minus any header lines you specify)
fid = fopen(filename, 'r');
% Don't ignore any lines but read until we stop
data = textscan(fid, formatspec);
% Repeat until we hit the end of the file
while ~feof(fid)
% Skip 5 lines and read until we can't read anymore
newdata = textscan(fid, formatspec, 'HeaderLines', 5);
% Append to existing data
data = cellfun(#(x, y)cat(1, x, y), data, newdata, 'uni', 0);
end
fclose(fid)
How can I create a huge text file in matlab with millions of lines which contanin numbers 9876543210 and 5 emptyspace and again repeated numbers and 5 spaces for milions of lines?
myNumbers= (0:9);
fid = fopen('lastFile.txt','wt');
for i=1:99
fprintf(fid,'%d%d%d%d%d%d%d%d%d%d \r',myNumbers);
end
fclose(fid);
Loopy way:
myNumbers= (0:9);
repeats = 20; %// number of times to repeat array on each line
lines = 99; %// number of lines to write to file
fid = fopen('lastFile.txt','wt');
for n=1:lines
for r=1:repeats
fprintf(fid,'%d%d%d%d%d%d%d%d%d%d ',myNumbers); %// no return just yet
end
fprintf(fid, '\n'); %// or \r if that's what you really want
end
fclose(fid);
I changed the \r to \n because it made more sense to me. Unless you're using Mac OS 9. I also changed the one space at the end of the format specification to 5 spaces, since that's what you said you wanted.
To get the array to repeat on one line, you just have to make sure you don't add the newline until you've got everything you want on that line. Then do that for however many lines you want.
There are other ways to do this, but this is the most straightforward.
I want to read a text file in matlab, but when I read I want to ignore a certain number of headers, the number shouldn't be fixed. Then I want to start reading from the last non-deleted row to a certain number of line again. So for example, I may want to ignore the first 7 rows and starting from the 8th row to the next 100 rows.
How can I do that easily?
Thanks!
Assume you have a text file data.txt with N_header number of headerlines containing 5 integers per row and you want to read N_lines from this file.
First create a link to the file so MatLab knows what file you need:
FID = fopen('data.txt') % Create a file id
Now you can use textscan to read N_lines lines and skipping N_header headerlines:
N_header = 7;
N_lines = 100;
formatSpec = '%d %d %d %d %d'; % Five integers per row seperated by whitespace
C = textscan(FID,formatSpec,N_lines,'HeaderLines',N_header);
fclose(FID)
The columns in your text file are stored in C{column number}. If you want to have each line stored in C use:
formatSpec = '%s'; % The whole string, i.e. each line
C = textscan(FID,formatSpec,N_lines,'delimiter','\n','HeaderLines',N_header); % Up to the line end '\n'
Which stores every line in cell array C.
Using the function which reads line by line:
http://www.mathworks.com/help/matlab/ref/fgetl.html
http://www.mathworks.com/help/matlab/ref/fgets.html
If you read it in a loop, whenever you reach unwanted line, just skip it using continue.
I have a routine that opens a lookup table file to see if a certain entry already exists before writing to the file. Each line contains about 2,500 columns of data. I need to check the first 2 columns of each line to make sure the entry doesn't exist.
I don't want to have to read in 2,500 columns for every line just to check 2 entries. I was attempting to use the fscanf function, but it gives me an Invalid size error when I attempt to only read 2 columns. Is there a way to only read part of each line of an input file?
if(exist(strcat(fileDirectory,fileName),'file'))
fileID = fopen(strcat(fileDirectory,fileName),'r');
if(fileID == -1)
disp('ERROR: Could not open file.\n')
end
% Read file to see if line already exists
dataCheck = fscanf(fileID, '%f %f', [inf 2]);
for i=1:length(dataCheck(:,1))
if(dataCheck(i,1) == sawAnglesDeg(sawCount))
if(dataCheck(i,2) == sarjAnglesDeg(floor((sawCount-1)/4)+1))
% This line has already been written in lookup table
lineExists = true;
disp('Duplicate lookup table line found. Skipping...\n')
break;
end
end
end
fclose(fileID);
end
Well, not really.
You should be able in a loop to do an fscanf of the first two doubles, followed by a fgetl to read the rest of the line, i.e. on the form:
while there_are_more_lines
dataCheck = fscanf(fileID, '%f', 2);
fgetl(fileID); % Read remainder of line, discarding it
% Do check here for each line
end
Since it is a text file, you can not really skip reading characters from the file. For binary files you can do an fseek, which can jump round in the file based on a byte-count - it can be used if you know exactly where the next line starts (in byte-count). But for a text file you do not know that, since each line will vary in length. If you save the data in a binary file instead, it would be possible to do something like that.
What I would probably do: Create two files, the first one containing the two "check-values", that could be read in quickly, and the other one containing the 2500 columns of data, with or without the two "check-values". They should be updated synchronously; when adding one line to the first file, also one line is added to the second file.
And I would definitely make a checkData matrix variable and keep that in memory as long as possible; when adding a new line to the file, also update the checkData matrix, so you should only need to read the file once initially and use the checkData matrix for the rest of the life of your program.
With textscan you can skip fields, parts of fields, or even "rest of line", so I would do this (based on MATLAB help example slightly modified):
fileID = fopen('data.dat');
data = textscan(fileID,'%f %f %*[^\n]');
fclose(fileID);
Then check data (should be the two columns you want) to see if any of those rows matches the requirements.
As #Jesper Grooss wrote, there is no solution to skip the remaining of a line without reading it. In a single text file context, a fastest solution would probably consist of
reading the entire file with textscan (one line of text into one cell element of a matrix)
appending the new line to the matrix even if it is a duplicate entry
uniquing the cell matrix with unique(cellmatrix, 'rows')
appending the new line to the text file if it corresponds to a new entry
The uniquing step replaces the putatively costly for loop.
I want to load a csv file in a matrix using matlab.
I used the following code:
formatSpec = ['%*f', repmat('%f',1,20)];
fid = fopen(filename);
X = textscan(fid, formatSpec, 'Delimiter', ',', 'CollectOutput', 1);
fclose(fid);
X = X{1};
The csv file has 1000 rows and 21 columns.
However, the matrix X generated has 2000 columns and 20 columns.
I tried using different delimiters like '\t' or '\n', but it doesn't change.
When I displayed X, I noticed that it displayed the correct csv file but with extra rows of zeros every 2 rows.
I also tried adding the 'HeaderLines' parameters:
`X = textscan(fid, formatSpec1, 'Delimiter', '\n', 'CollectOutput', 1, 'HeaderLines', 1);`
but this time, the result is an empty matrix.
Am I missing something?
EDIT: #horchler
I could read with no problem the 'test.csv' file.
There is no extra comma at the end of each row. I generated my csv file with a python script: I read the rows of another csv file, modified these (selecting some of them and doing arithmetic operations on them) and wrote the new rows on another csv file. In order to do this, I converted each element of the first csv file into floats...
New Edit:
Reading the textscan documentation more carefully, I think the problem is that my input file is neither a textfile nor a str, but a file containing floats
EDIT: three lines from the file
0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0,2
1,-0.3834323,-1.92452324171,-1.2453254094,0.43455627857,-0.24571121,0.4340657,1,1,0,0,0,0.3517396202,1,0,0,0.3558122164,0.2936975319,0.4105696144,0,1,0
-0.78676,-1.09767,0.765554578,0.76579043,0.76,1,0,0,323124.235998,1,0,0,0,1,0,0,1,0,0,0,2
How about using regex ?
X=[];
fid = fopen(filename);
while 1
fl = fgetl(fid);
if ~ischar(fl), break, end
r =regexp(fl,'([-]*\d+[.]*\d*)','match');
r=r(1:21); % because your line 2nd is somehow having 22 elements,
% all lines must have same # elements or an error will be thrown
% Error: CAT arguments dimensions are not consistent.
X=[X;r];
end
fclose(fid);
Using csvread to read a csv file seems a good option. However, I also tend to read csv files with textscan as files are sometimes badly written. Having more options to read them is therefore necessary.
I face a reading problem like yours when I think the file is written a certain way but it is actually written another way. To debug it I use fgetl and print, for each line read, both the output of fgetl and its double version (see the example below). Examining the double version, you may find which character causes a problem.
In your case, I would first look at multiple occurrences of delimiters (',' and '\t') and , in 'textscan', I would activate the option 'MultipleDelimsAsOne' (while turning off 'CollectOutput').
fid = fopen(filename);
tline = fgetl(fid);
while ischar(tline)
disp(tline);
double(tline)
pause;
tline = fgetl(fid);
end
fclose(fid);