create a huge text file in matlab with specific numbers? - matlab

How can I create a huge text file in matlab with millions of lines which contanin numbers 9876543210 and 5 emptyspace and again repeated numbers and 5 spaces for milions of lines?
myNumbers= (0:9);
fid = fopen('lastFile.txt','wt');
for i=1:99
fprintf(fid,'%d%d%d%d%d%d%d%d%d%d \r',myNumbers);
end
fclose(fid);

Loopy way:
myNumbers= (0:9);
repeats = 20; %// number of times to repeat array on each line
lines = 99; %// number of lines to write to file
fid = fopen('lastFile.txt','wt');
for n=1:lines
for r=1:repeats
fprintf(fid,'%d%d%d%d%d%d%d%d%d%d ',myNumbers); %// no return just yet
end
fprintf(fid, '\n'); %// or \r if that's what you really want
end
fclose(fid);
I changed the \r to \n because it made more sense to me. Unless you're using Mac OS 9. I also changed the one space at the end of the format specification to 5 spaces, since that's what you said you wanted.
To get the array to repeat on one line, you just have to make sure you don't add the newline until you've got everything you want on that line. Then do that for however many lines you want.
There are other ways to do this, but this is the most straightforward.

Related

Ignoring certain number of rows while reading text file

I want to read a text file in matlab, but when I read I want to ignore a certain number of headers, the number shouldn't be fixed. Then I want to start reading from the last non-deleted row to a certain number of line again. So for example, I may want to ignore the first 7 rows and starting from the 8th row to the next 100 rows.
How can I do that easily?
Thanks!
Assume you have a text file data.txt with N_header number of headerlines containing 5 integers per row and you want to read N_lines from this file.
First create a link to the file so MatLab knows what file you need:
FID = fopen('data.txt') % Create a file id
Now you can use textscan to read N_lines lines and skipping N_header headerlines:
N_header = 7;
N_lines = 100;
formatSpec = '%d %d %d %d %d'; % Five integers per row seperated by whitespace
C = textscan(FID,formatSpec,N_lines,'HeaderLines',N_header);
fclose(FID)
The columns in your text file are stored in C{column number}. If you want to have each line stored in C use:
formatSpec = '%s'; % The whole string, i.e. each line
C = textscan(FID,formatSpec,N_lines,'delimiter','\n','HeaderLines',N_header); % Up to the line end '\n'
Which stores every line in cell array C.
Using the function which reads line by line:
http://www.mathworks.com/help/matlab/ref/fgetl.html
http://www.mathworks.com/help/matlab/ref/fgets.html
If you read it in a loop, whenever you reach unwanted line, just skip it using continue.

Is there a way to only read part of a line of an input file?

I have a routine that opens a lookup table file to see if a certain entry already exists before writing to the file. Each line contains about 2,500 columns of data. I need to check the first 2 columns of each line to make sure the entry doesn't exist.
I don't want to have to read in 2,500 columns for every line just to check 2 entries. I was attempting to use the fscanf function, but it gives me an Invalid size error when I attempt to only read 2 columns. Is there a way to only read part of each line of an input file?
if(exist(strcat(fileDirectory,fileName),'file'))
fileID = fopen(strcat(fileDirectory,fileName),'r');
if(fileID == -1)
disp('ERROR: Could not open file.\n')
end
% Read file to see if line already exists
dataCheck = fscanf(fileID, '%f %f', [inf 2]);
for i=1:length(dataCheck(:,1))
if(dataCheck(i,1) == sawAnglesDeg(sawCount))
if(dataCheck(i,2) == sarjAnglesDeg(floor((sawCount-1)/4)+1))
% This line has already been written in lookup table
lineExists = true;
disp('Duplicate lookup table line found. Skipping...\n')
break;
end
end
end
fclose(fileID);
end
Well, not really.
You should be able in a loop to do an fscanf of the first two doubles, followed by a fgetl to read the rest of the line, i.e. on the form:
while there_are_more_lines
dataCheck = fscanf(fileID, '%f', 2);
fgetl(fileID); % Read remainder of line, discarding it
% Do check here for each line
end
Since it is a text file, you can not really skip reading characters from the file. For binary files you can do an fseek, which can jump round in the file based on a byte-count - it can be used if you know exactly where the next line starts (in byte-count). But for a text file you do not know that, since each line will vary in length. If you save the data in a binary file instead, it would be possible to do something like that.
What I would probably do: Create two files, the first one containing the two "check-values", that could be read in quickly, and the other one containing the 2500 columns of data, with or without the two "check-values". They should be updated synchronously; when adding one line to the first file, also one line is added to the second file.
And I would definitely make a checkData matrix variable and keep that in memory as long as possible; when adding a new line to the file, also update the checkData matrix, so you should only need to read the file once initially and use the checkData matrix for the rest of the life of your program.
With textscan you can skip fields, parts of fields, or even "rest of line", so I would do this (based on MATLAB help example slightly modified):
fileID = fopen('data.dat');
data = textscan(fileID,'%f %f %*[^\n]');
fclose(fileID);
Then check data (should be the two columns you want) to see if any of those rows matches the requirements.
As #Jesper Grooss wrote, there is no solution to skip the remaining of a line without reading it. In a single text file context, a fastest solution would probably consist of
reading the entire file with textscan (one line of text into one cell element of a matrix)
appending the new line to the matrix even if it is a duplicate entry
uniquing the cell matrix with unique(cellmatrix, 'rows')
appending the new line to the text file if it corresponds to a new entry
The uniquing step replaces the putatively costly for loop.

Textscan generates a vectore twice the expected size

I want to load a csv file in a matrix using matlab.
I used the following code:
formatSpec = ['%*f', repmat('%f',1,20)];
fid = fopen(filename);
X = textscan(fid, formatSpec, 'Delimiter', ',', 'CollectOutput', 1);
fclose(fid);
X = X{1};
The csv file has 1000 rows and 21 columns.
However, the matrix X generated has 2000 columns and 20 columns.
I tried using different delimiters like '\t' or '\n', but it doesn't change.
When I displayed X, I noticed that it displayed the correct csv file but with extra rows of zeros every 2 rows.
I also tried adding the 'HeaderLines' parameters:
`X = textscan(fid, formatSpec1, 'Delimiter', '\n', 'CollectOutput', 1, 'HeaderLines', 1);`
but this time, the result is an empty matrix.
Am I missing something?
EDIT: #horchler
I could read with no problem the 'test.csv' file.
There is no extra comma at the end of each row. I generated my csv file with a python script: I read the rows of another csv file, modified these (selecting some of them and doing arithmetic operations on them) and wrote the new rows on another csv file. In order to do this, I converted each element of the first csv file into floats...
New Edit:
Reading the textscan documentation more carefully, I think the problem is that my input file is neither a textfile nor a str, but a file containing floats
EDIT: three lines from the file
0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0,2
1,-0.3834323,-1.92452324171,-1.2453254094,0.43455627857,-0.24571121,0.4340657,1,1,0,0,0,0.3517396202,1,0,0,0.3558122164,0.2936975319,0.4105696144,0,1,0
-0.78676,-1.09767,0.765554578,0.76579043,0.76,1,0,0,323124.235998,1,0,0,0,1,0,0,1,0,0,0,2
How about using regex ?
X=[];
fid = fopen(filename);
while 1
fl = fgetl(fid);
if ~ischar(fl), break, end
r =regexp(fl,'([-]*\d+[.]*\d*)','match');
r=r(1:21); % because your line 2nd is somehow having 22 elements,
% all lines must have same # elements or an error will be thrown
% Error: CAT arguments dimensions are not consistent.
X=[X;r];
end
fclose(fid);
Using csvread to read a csv file seems a good option. However, I also tend to read csv files with textscan as files are sometimes badly written. Having more options to read them is therefore necessary.
I face a reading problem like yours when I think the file is written a certain way but it is actually written another way. To debug it I use fgetl and print, for each line read, both the output of fgetl and its double version (see the example below). Examining the double version, you may find which character causes a problem.
In your case, I would first look at multiple occurrences of delimiters (',' and '\t') and , in 'textscan', I would activate the option 'MultipleDelimsAsOne' (while turning off 'CollectOutput').
fid = fopen(filename);
tline = fgetl(fid);
while ischar(tline)
disp(tline);
double(tline)
pause;
tline = fgetl(fid);
end
fclose(fid);

Importing a CSV file in Matlab using fopen/fclose that has comments of varying lengths with no specific preceding character

I need to import a CSV file into matlab that has ~160,000 rows and 25 columns. The 33'rd column has commentary on some rows. The comments are of varying lengths, and are textual, for example, a comment might read (without the quotes around it) "mortality due to suppression" (it is a forest inventory database).
The first four columns represent the site and time of the tree measurements.
The ultimate goal of the program is to consolidate the information in the file into unique smaller files, each one which will hold all trees' data on a unique combination of site and time.
At the moment the import strategy I am using is
fid = fopen('TP07303_v1.csv','r');
tline = fgetl(fid);
% split the title line (header) and call it A
A(1,:) = regexp(tline,'\,','split');
% parse and read the rest of the file
ctr = 1;
while(~feof(fid))
if ischar(tline)
ctr = ctr + 1;
tline = fgetl(fid);
A(ctr, :) = regexp(tline,'\,','split');
else
break;
end
end
fclose(fid);
But when I get to the first line with a comment, it snags. I don't need the comments for anything I am doing and am happy to just not import that column entirely. Is there a way to do this?
I am also confused about the number of columns (25 or 33+?), but if the number of commas on each line varies that is why you are having problems. Is that why you are not using dlmread?
Also the ischar seems redundant since tline will always be a char, unless you are at the end of the file, but you are looping on that condition. In effect, you are checking the same thing twice in a row.
while(~feof(fid))
ctr = ctr + 1;
tline = fgetl(fid);
tmp = regexp(tline,'\,','split');
% make sure to not assign more than 25 elements to this row of A
A(ctr, :) = tmp(1:25);
if numel(tmp)>25
% do something with the rest
end
end
and there would never be less than 25 elements right?

Matlab .txt file analyses

I am analyzing a set of text in a .txt file. the file has 30 lines and each line contains different phrases both containing text, numbers, and symbols.
what's the best way to import this file into Matlab for analyses
(i.e.: how many Capital I's are in the text file or how many #text phrases are in the file (analyzing tweets on each line)
I think you'd best read the file line-by-line and save each line in a cell of a cell array:
fid = fopen(filename);
txtlines = cell(0);
tline = fgetl(fid);
while ischar(tline)
txtlines{numel(txtlines)+1}=tline;
tline = fgetl(fid);
end
fclose(fid);
This way you can easily access each line with txtlines{ii}.
If you always need to perform operations on the complete text (ie how many a's in the whole text-file, and not per-line), you can of course just throw the lines together in a single variable.
Executing an operation on each line, can be done simply with cellfun, for example counting the number of capital 'I's:
capI_per_line = cellfun(#(str) numel(strfind(str,'I')),txtlines);
If the file is reasonably sized (most 30 line files are) I would read it all into memory at once.
fid = fopen('saturate.m');
str = fread(fid,inf,'*char')';
fclose(fid);
Then, depending on your needs you can use basic matrix operations, string operations or regexp style analysis on the str variable.
For example, "how many capital 'I''s?" is:
numIs = sum(str=='I');
Or, "how many instances of 'someString'?" is:
numSomeString = length(strfind(str, 'someString'));