I am trying to read a large 3 GB text file into MATLAB, organized by a header with names, and with a delimiter of a space (see below fruit.txt) However, the only data needed is the Grapes Column. Since it is a huge file, I am using a loop below to only read in one column into Matlab. How can I read in only one column of data with the loop below? I have to use a loop and preselection of needed columns since the file is over 3 GB of data.
fruit.txt
Apples Grapes Oranges
3 4 A
4 G 1
6 A 3
3 4 1
A 6 1
2 2 4
filename = 'fruit.txt'
delimiter = ' ';
formatSpec = '%s%s%s[^\n\r]';
fileID = fopen(filename, 'r' ) ;
out = {};
k = 0 ;
while ~feof(fileID)
k = k+1;
C = textscan(fileID, formatSpec, 'Delimiter', delimiter);
out{end+1} = Grapes{:,2};
end
Use readmatrix and specify one header row and that you only want column 2:
readmatrix(filename, 'FileType','text', 'Delimiter', delimiter, 'NumHeaderLines', 1, 'Range', 'B:B');
Related
Data sample:
1
2
3
-
4
5
-
6
7
8
I have a file with one number per line and there is a separator. How can I get an array like that:
output = [[1,2,3], [4,5], [6,7,8]]
Thanks
Read the file contents into int vector fscanf(fid,'%d',[1 inf]) and skip one string fscanf(fid,'%s', 1), then continue reading a vector and one string until end-of-file.
out = {};
fid = fopen('file.txt');
while ~feof(fid)
out{end+1:end+1} = fscanf(fid,'%d',[1 inf]);
fscanf(fid,'%s', 1);
end
fclose(fid);
out =
1×3 cell array
{[1 2 3]} {[4 5]} {[6 7 8]}
I am trying to read a binary file that has 4 header lines and the rest are data. The data is of the form abcd, where abc are 3 values having double precision and d is an unsigned short (no spaces between the numbers or the sets of numbers). Below is the code that I tried and the resultant 'data1' has no data in it. Any suggestions?
filename1 = 'myfile.bin';
fid = fopen(filename1, 'r');
for k = 1 : 4 %ignore the first 4 lines
fgetl(fid);
end
data1 = fscanf(fid,'%f%f%f%ushort',Inf);
fclose(fid)
I have a huge text file in the following format:
1 2
1 3
1 10
1 11
1 20
1 376
1 665255
2 4
2 126
2 134
2 242
2 247
First column is the x coordinate while second column is the y coordinate.
It indicates that if I had to construct a Matrix
M = zeros(N, N);
M(1, 2) = 1;
M(1, 3) = 1;
.
.
M(2, 247) = 1;
This text file is huge and can't be brought to main memory at once. I must read it line by line. And save it in a sparse matrix.
So I need the following function:
function mat = generate( path )
fid = fopen(path);
tline = fgetl(fid);
% initialize an empty sparse matrix. (I know I assigned Mat(1, 1) = 1)
mat = sparse(1);
while ischar(tline)
tline = fgetl(fid);
if ischar(tline)
C = strsplit(tline);
end
mat(C{1}, C{2}) = 1;
end
fclose(fid);
end
But unfortunately besides the first row it just puts trash in my sparse mat.
Demo:
1 7
1 9
2 4
2 9
If I print the sparse mat I get:
(1,1) 1
(50,52) 1
(49,57) 1
(50,57) 1
Any suggestions ?
Fixing what you have...
Your problem is that C is a cell array of characters, not numbers. You need to convert the strings you read from the file into integer values. Instead of strsplit you can use functions like str2num and str2double. Since tline is a space-delimited character array of integers in this case, str2num is the easiest to use to compute C:
C = str2num(tline);
Then you just index C like an array instead of a cell array:
mat(C(1), C(2)) = 1;
Extra tidbit: If you were wondering how your demo code still worked even though C contained characters, it's because MATLAB has a tendency to automatically convert variables to the correct type for certain operations. In this case, the characters were converted to their double ASCII code equivalents: '1' became 49, '2' became 50, etc. Then it used these as indices into mat.
A simpler alternative...
You don't even have to bother with all that mess above, since you can replace your entire function with a much simpler approach using dlmread and sparse like so:
data = dlmread(filePath);
mat = sparse(data(:, 1), data(:, 2), 1);
clear data; % Save yourself some memory if you don't need it any more
I'm currently trying to export multiple matrices of unequal lengths into a delimited .txt file thus I have been padding the shorter matrices with 0's such that dlmwrite can use horzcat without error:
dlmwrite(filename{1},[a,b],'delimiter','\t')
However ideally I do not want the zeroes to appear in the .txt file itself - but rather the entries are left blank.
Currently the .txt file looks like this:
55875 3.1043e+05
56807 3.3361e+05
57760 3.8235e+05
58823 4.2869e+05
59913 4.3349e+05
60887 0
61825 0
62785 0
63942 0
65159 0
66304 0
67509 0
68683 0
69736 0
70782 0
But I want it to look like this:
55875 3.1043e+05
56807 3.3361e+05
57760 3.8235e+05
58823 4.2869e+05
59913 4.3349e+05
60887
61825
62785
63942
65159
66304
67509
68683
69736
70782
Is there anyway I can do this? Is there an alternative to dlmwrite which will mean I do not need to have matrices of equal lengths?
If a is always longer than b you could split vector a into two vectors of same length as vector b and the rest:
a = [1 2 3 4 5 6 7 8]';
b = [9 8 7 ]';
len = numel(b);
dlmwrite( 'foobar.txt', [a(1:len), b ], 'delimiter', '\t' );
dlmwrite( 'foobar.txt', a(len+1:end), 'delimiter', '\t', '-append');
You can read in the numeric data and convert to string and then add proper whitespaces to have the final output as string based cell array, which you can easily write into the output text file.
Stage 1: Get the cell of strings corresponding to the numeric data from column vector inputs a, b, c and so on -
%// Concatenate all arrays into a cell array with numeric data
A = [{a} {b} {c}] %// Edit this to add more columns
%// Create a "regular" 2D shaped cell array to store the cells from A
lens = cellfun('length',A)
max_lens = max(lens)
A_reg = cell(max_lens,numel(lens))
A_reg(:) = {''}
A_reg(bsxfun(#le,[1:max_lens]',lens)) = cellstr(num2str(vertcat(A{:}))) %//'
%// Create a char array that has string data from input arrays as strings
wsp = repmat({' '},max_lens,1) %// Create whitespace cell array
out_char = [];
for iter = 1:numel(A)
out_char = [out_char char(A_reg(:,iter)) char(wsp)]
end
out_cell = cellstr(out_char)
Stage 2: Now, that you have out_cell as the cell array that has the strings to be written to the text file, you have two options next for the writing operation itself.
Option 1 -
dlmwrite('results.txt',out_cell(:),'delimiter','')
Option 2 -
outfile = 'results.txt';
fid = fopen(outfile,'w');
for row = 1:numel(out_cell)
fprintf(fid,'%s\n',out_cell{row});
end
fclose(fid);
Essentially I am writing a Matlab file to change the 2nd, 3rd and 4th numbers in the line below "STR" and above "CON" in the text file (which is given below and is called '0.dat'). Currently, my Matlab code makes no changes to the text file.
Text File
pri
3
len 0.03
vic 5 5
MAT
1 147E9 0.3 0 4.9E9 8.5E9
LAY
1 0.000125 1 45
2 0.000125 1 0
3 0.000125 1 -45
4 0.000125 1 90
5 0.000125 1 45
WAL
1 1 2 3 4 5
PLATE
1 0.005 1 1
STR
1 32217.442335442 3010.34241024889 2689.48842888812
CON
1 2 1 2 3 1 3 4 1 4 5 1 5 6 1 6 7 1
ATT
1 901 7 901
LON
34
POI
123456
1 7
X 0.015
123456
2 6
X 0.00381966011250105 0.026180339887499
123456
3 5
X 0.000857864376269049 0.0291421356237309
123456
4
X 0
PLO
2 3
CRO
0
RES
INMOD=1
END
Matlab code:
impafp = importdata('0.dat','\t');
afp = impafp.textdata;
fileID = fopen('0.dat','r+');
for i = 1:length(afp)
if (strncmpi(afp{i},'con',3))
newNx = 100;
newNxy = 50;
newNy = 500;
myformat = '%0.6f %0.9f %0.9f %0.9f\n';
newData = [1 newNx newNxy newNy];
afp{i-1} = fprintf(fileID, myformat, newData);
fclose(fileID);
end
end
From the help for importdata:
For ASCII files and spreadsheets, importdata expects to find numeric
data in a rectangular form (that is, like a matrix). Text headers can
appear above or to the left of numeric data. To import ASCII files
with numeric characters anywhere else, including columns of character
data or formatted dates or times, use TEXTSCAN instead of import data.
Indeed, if you print out the value of afp, you'll see that it just contains the first line. You were also not performing any operation that was writing to a file. And you were not closing the file ID if the if state wasn't triggered.
Here is one way to do this with textscan (which is probably faster too):
% Read in data as strings using textscan
fid = fopen('0.dat','r');
afp = textscan(fid,'%s','Delimiter','');
fclose(fid);
isSTR = strncmpi(afp{:},'str',3); % True for all lines starting with STR
isCON = strncmpi(afp{:},'con',3); % True for all lines starting with CON
% Find indices to replace - create logical indices
% True if line before is STR and line after is CON
% Offset isSTR and isCON by 2 elements in opposite directions to align
% Use & to perform vectorized AND
% Pad with FALSE on either side to make output the same length as afp{1}{:}
datIDX = [false;(isSTR(1:end-2)&isCON(3:end));false];
% Overwrite data using sprintf
myformat = '%0.6f %0.9f %0.9f %0.9f';
newNx = 100;
newNxy = 50;
newNy = 500;
newData = [1 newNx newNxy newNy];
afp{1}{datIDX} = sprintf(myformat, newData); % Set only elements that pass test
% Overwrite old file using fprintf (or change filename to new one)
fid = fopen('0.dat','w');
fprintf(fid,'%s\r\n',afp{1}{1:end-1});
fprintf(fid,'%s',afp{1}{end}); % Avoid blank line at end
fclose(fid);
If you're unfamiliar with logical indexing, you might read this blog post and this.
I would recommend just reading the entire file in, finding which lines contain your "keywords", modifying specific lines, and then writing it back out to a file, which can have the same name or a different one.
file = fileread('file.dat');
parts = regexp(file,'\n','split');
startIndex = find(~cellfun('isempty',regexp(parts,'STR')));
endIndex = find(~cellfun('isempty',regexp(parts,'CON')));
ind2Change = startIndex+1:endIndex-1;
tempCell{1} = sprintf('%0.6f %0.9f %0.9f %0.9f',[1,100,50,500]);
parts(ind2Change) = deal(tempCell);
out = sprintf('%s\n',parts{:});
out = out(1:end-1);
fh = fopen('file2.dat','w');
fwrite(fh,out);
fclose(fh);