I have a textfile with the following structure:
1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605
37
1999-01-05
1,122.50
1,087.50
1,122.50
0
3,250
712,175
14
...
The file contains repeated sets of eight values (a date followed by seven numbers, each on their own line).
I want to read it into MATLAB and get the values into different vectors. I've tried to accomplish this with several different methods, but none have worked - all output some sort of error.
In case it's important, I'm doing this on a Mac.
EDIT: This is a shorter version of the code I previously had in my answer...
If you'd like to read your data file directly, without having to preprocess it first as dstibbe suggested, the following should work:
fid = fopen('datafile.txt','rt');
data = textscan(fid,'%s %s %s %s %s %s %s %s','Delimiter','\n');
fclose(fid);
data = [datenum(data{1}) cellfun(#str2double,[data{2:end}])]';
The above code places each set of 8 values into an 8-by-N matrix, with N being the number of 8 line sets in the data file. The date is converted to a serial date number so that it can be included with the other double-precision values in the matrix. The following functions (used in the above code) may be of interest: TEXTSCAN, DATENUM, CELLFUN, STR2DOUBLE.
I propose yet another solution. This one is the shortest in MATLAB code. First using sed, we format the file as a CSV file (comma seperated, with each record on one line):
cat a.dat | sed -e 's/,//g ; s/[ \t]*$/,/g' -e '0~8 s/^\(.*\),$/\1\n/' |
sed -e :a -e '/,$/N; s/,\n/,/; ta' -e '/^$/d' > file.csv
Explanation: First we get rid of the thousands comma seperator, and trim spaces at the end of each line adding a comma. But then we remove that ending comma for each 8th line. Finally we join the lines and remove empty ones.
The output will look like this:
1999-01-04,1100.00,1060.00,1092.50,0,6225,1336605,37
1999-01-05,1122.50,1087.50,1122.50,0,3250,712175,14
Next in MATLAB, we simply use textscan to read each line, with the first field as a string (to be converted to num), and the rest as numbers:
fid = fopen('file.csv', 'rt');
a = textscan(fid, '%s %f %f %f %f %f %f %f', 'Delimiter',',', 'CollectOutput',1);
fclose(fid);
M = [datenum(a{1}) a{2}]
and the resulting matrix M is:
730124 1100 1060 1092.5 0 6225 1336605 37
730125 1122.5 1087.5 1122.5 0 3250 712175 14
Use a script to modify your text file into something that Matlab can read.
eg. make it a matrix:
M = [
1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605; <-- notice the ';'
37
1999-01-05
1,122.50
1,087.50
1,122.50
0
3,250; <-- notice the ';'
712,175
14
...
]
import this into matlab and read the various vectors from the matrix.
Note: my matlab is a bit rusty. Might containt errors.
It isn't entirely clear what form you want the data to be in once you've read it. The code below puts it all in one matrix, with each row representing a group of 8 rows in your text file. You may wish use different variables for different columns, or (if you have access to the Statistics toolbox), use a dataset array.
% Read file as text
text = fileread('c:/data.txt');
% Split by line
x = regexp(text, '\n', 'split');
% Remove commas from numbers
x = regexprep(x, ',', '')
% Number of items per object
n = 8;
% Get dates
index = 1:length(x);
dates = datenum(x(rem(index, n) == 1));
% Get other numbers
nums = str2double(x(rem(index, n) ~= 1));
nums = reshape(nums, (n-1), length(nums)/(n-1))';
% Combine dates and numbers
thedata = [dates nums];
You could also look into the function textscan for alternative ways of solving the problem.
Similar to Richie's. Using str2double to convert the file strings to doubles. This implementation processes line by line instead of breaking the file up with a regular expression. The output is a cell array of individual vectors.
function vectors = readdata(filename)
fid=fopen(filename);
tline = fgetl(fid);
counter = 0;
vectors = cell(7,1);
while ischar(tline)
disp(tline)
if counter > 0
vectors{counter} = [vectors{counter} str2double(tline)];
end
counter = counter + 1
if counter > 7
counter = 0;
end
tline = fgetl(fid);
end
fclose(fid);
This has regular expression checking to make sure your data is formatted well.
fid = fopen('data.txt','rt');
%these will be your 8 value arrays
val1 = [];
val2 = [];
val3 = [];
val4 = [];
val5 = [];
val6 = [];
val7 = [];
val8 = [];
linenum = 0; % line number in file
valnum = 0; % number of value (1-8)
while 1
line = fgetl(fid);
linenum = linenum+1;
if valnum == 8
valnum = 1;
else
valnum = valnum+1;
end
%-- if reached end of file, end
if isempty(line) | line == -1
fclose(fid);
break;
end
switch valnum
case 1
pat = '(?\d{4})-(?\d{2})-(?\d{2})'; % val1 (e.g. 1999-01-04)
case 2
pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val2 (e.g. 1,100.00) [valid up to 1billion-1]
case 3
pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val3 (e.g. 1,060.00) [valid up to 1billion-1]
case 4
pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val4 (e.g. 1,092.50) [valid up to 1billion-1]
case 5
pat = '(?\d+)'; % val5 (e.g. 0)
case 6
pat = '(?\d*[,]*\d*[,]*\d+)'; % val6 (e.g. 6,225) [valid up to 1billion-1]
case 7
pat = '(?\d*[,]*\d*[,]*\d+)'; % val7 (e.g. 1,336,605) [valid up to 1billion-1]
case 8
pat = '(?\d+)'; % val8 (e.g. 37)
otherwise
error('bad linenum')
end
l = regexp(line,pat,'names'); % l is for line
if length(l) == 1 % match
if valnum == 1
serialtime = datenum(str2num(l.yr),str2num(l.mo),str2num(l.dy)); % convert to matlab serial date
val1 = [val1;serialtime];
else
this_val = strrep(l.val,',',''); % strip out comma and convert to number
eval(['val',num2str(valnum),' = [val',num2str(valnum),';',this_val,'];']) % save this value into appropriate array
end
else
warning(['line number ',num2str(linenum),' skipped! [didnt pass regexp]: ',line]);
end
end
Related
In short, I'm having a headache in multiple languages to read a txt file (linked below). My most familiar language is MATLAB so for that reason I'm using that in this example. I've found a way to read this file in ~ 5 minutes, but given I'll have tons and tons of data from my instrument shortly as it measures all day every 30 seconds this just isn't feasible.
I'm looking for a way to quickly read these irregular text files so that going forward I can knock these out with less of a time burden.
You can find my exact data at this link:
http://lb3.pandonia.net/BostonMA/Pandora107s1/L0/Pandora107s1_BostonMA_20190814_L0.txt.bz2
I've been using the "readtable" function in matlab and I have achieved a final product I want but I'm looking to increase the speed
Below is my code!
clearvars -except pan day1; % Clearing all variables except for the day and instrument variables.
close all;
clc;
pan_mat = [107 139 155 153]; % Matrix of pandora numbers for file-choosing
reasons.
pan = pan_mat(pan); % pandora number I'm choosing
pan = num2str(pan); % Turning Pandora number into a string.
%pan = '107'
pandora = strcat('C:\Users\tadams15\Desktop\Folders\Counts\Pandora_Dta\',pan)
% string that designates file location
%date = '90919'
month = '09'; % Month
day2 = strcat('0',num2str(day1)) % Creating a day name for the figure I ultimately produce
cd(pandora)
d2 = strcat('2019',num2str(month),num2str(day2)); % The final date variable
for the figure I produce
%file_pan = 'Pandora107s1_BostonMA_20190909_L0';
file_pan = strcat('Pandora',pan,'s1_BostonMA_',d2,'_L0'); % File name string
%Try reading it in line by line?
% Load in as a string and then convert the lines you want as numbers into
% number.
delimiterIn = '\t';
headerlinesIn = 41;
A = readtable(file_pan,'HeaderLines', 41, 'Delimiter', '\t'); %Reading the
file as a table
A = table2cell(A); % Converting file to a cell
A = regexp(A, ' ', 'split'); % converting cell to a structure matrix.
%%
A= array2table(A); % Converting Structure matrix back to table
row_num = 0;
pan_mat_2 = zeros(2359,4126);
datetime_mat = zeros(2359,2);
blank = 0;
%% Converting data to proper matrices
[length width] = size(A);
% The matrix below is going through "A" and writing from it to a new
% matrix, "pan_mat_2" which is my final product as well as singling out the
% rows that contain non-number variables I'd like to keep and adding them
% later.
tic
%flag1
for i = 1:length; % Make second number the length of the table, A
blank = 0;
b = table2array(A{i,1});
[rows, columns] = size(b);
if columns > 4120 && columns < 4140
row_num = row_num + 1;
blank = regexp(b(2), 'T', 'split');
blank2 = regexp(blank{1,1}(2), 'Z', 'split');
datetime_mat(row_num,1) = str2double(blank{1,1}(1));
datetime_mat(row_num,2) = str2double(blank2{1,1}(1));
for j = 1:4126;
pan_mat_2(row_num,j) = str2double(b(j));
end
end
end
toc
%flag2
In short, I'm already getting the result I want but the part of the code where I'm writing to a new array "flag 1" to "flag 2" is taking roughly 222 seconds while the entire code only takes about 248 seconds. I'd like to find a better way to create the data there than to write it to a new array and take a whole bunch of time.
Any suggestions?
Note:
There are a quite a few improvments you can make for speed but there are also corrections. You preallocate you final output variable with hard coded values:
pan_mat_2 = zeros(2359,4126);
But later you populate it in a loop which run for i = 1:length.
length is the full number of lines picked from the file. In your example file there are only 784 lines. So even if all your line were valid (ok to be parsed), you would only ever fill the first 784 lines of the total 2359 lines you allocated in your pan_mat_2. In practice, this file has only 400 valid data lines, so your pan_mat_2 could definitely be smaller.
I know you couldn't know you had only 400 line parsed before you parsed them, but you knew from the beginning that you had only 784 line to parse (you had the info in the variable length). So in case like these pre-allocate to 784 and only later discard the empty lines.
Fortunately, the solution I propose does not need to pre-allocate larger then discard. The matrices will end up the right size from the start.
The code:
%%
file_pan = 'Pandora107s1_BostonMA_20190814_L0.txt' ;
delimiterIn = '\t';
headerlinesIn = 41;
A = readtable(file_pan,'HeaderLines', 41, 'Delimiter', '\t'); %Reading the file as a table
A = table2cell(A); % Converting file to a cell
A = regexp(A, ' ', 'split'); % converting cell to a structure matrix.
%% Remove lines which won't be parsed
% Count the number of elements in each line
nelem = cell2mat( cellfun( #size , A ,'UniformOutput',0) ) ;
nelem(:,1) = [] ;
% find which lines does not have enough elements to be parsed
idxLine2Remove = ~(nelem > 4120 & nelem < 4140) ;
% remove them from the data set
A(idxLine2Remove) = [] ;
%% Remove nesting in cell array
nLinesToParse = size(A,1) ;
A = reshape( [A{:}] , [], nLinesToParse ).' ;
% now you have a cell array of size [400x4126] cells
%% Now separate the columns with different data type
% Column 1 => [String] identifier
% Column 2 => Timestamp
% Column 3 to 4125 => Numeric values
% Column 4126 => empty cell created during the 'split' operation above
% because of a trailing space character.
LineIDs = A(:,1) ;
TimeStamps = A(:,2) ;
Data = A(:,3:end-1) ; % fetch to "end-1" to discard last empty column
%% now extract the values
% You could do that directly:
% pan_mat = str2double(Data) ;
% but this takes a long time. A much computationnaly faster way (even if it
% uses more complex code) would be:
dat = strjoin(Data) ; % create a single long string made of all the strings in all the cells
nums = textscan( dat , '%f' , Inf ) ; % call textscan on it (way faster than str2double() )
pan_mat = reshape( cell2mat( nums ) , nLinesToParse ,[] ) ; % reshape to original dimensions
%% timestamps
% convert to character array
strTimeStamps = char(TimeStamps) ;
% convert to matlab own datetime numbering. This will be a lot faster if
% you have operations to do on the time stamps later
ts = datenum(strTimeStamps,'yyyymmddTHHMMSSZ') ;
%% If you really want them the way you had it in your example
strTimeStamps(:,9) = ' ' ; % replace 'T' with ' '
strTimeStamps(:,end) = ' ' ; % replace 'Z' characters with ' '
%then same again, merge into a long string, parse then reshape accordingly
strdate = reshape(strTimeStamps.',1,[]) ;
tmp = textscan( strdate , '%d' , Inf ) ;
datetime_mat = reshape( double(cell2mat(tmp)),2,[]).' ;
The performance:
As you can see on my machine your original code takes ~102 seconds to execute, with 80% of that (81s) spent on calling the function str2double() 3,302,400 times!
My solution, run on the same input file, takes ~5.5 seconds, with half of the time spent on calling strjoin() 3 times.
When you read the code above, try to understand how I limited the repetition of function call in lengthy loops by trying to keep everything as vectorised as possible.
Using the profiler, you can see that you call str2double 3302400 times in a run which takes about 80% of the total time on my pc. Now thats suboptimal, as each time you only translate 1 value and as far as your code goes you dont need the values as string again. I added this under you original code:
row_num = 0;
pan_mat_2_b = cell(2359,4126);
datetime_mat_b = cell(2359,2);%not zeros
blank = 0;
tic
%flag1
for i = 1:length % Make second number the length of the table, A
blank = 0;
b = table2array(A{i,1});
[rows, columns] = size(b);
if columns > 4120 && columns < 4140
row_num = row_num + 1;
blank = regexp(b(2), 'T', 'split');
blank2 = regexp(blank{1,1}(2), 'Z', 'split');
%datetime_mat(row_num,1) = str2double(blank{1,1}(1));
%datetime_mat(row_num,2) = str2double(blank2{1,1}(1));
datetime_mat_b(row_num,1) = blank{1,1}(1);
datetime_mat_b(row_num,2) = blank2{1,1}(1);
pan_mat_2_b(row_num,:) = b;
% for j = 1:4126
% pan_mat_2(row_num,j) = str2double(b(j));
% end
end
end
datetime_mat_b = datetime_mat_b(~all(cellfun('isempty',datetime_mat_b),2),:);
pan_mat_2_b=pan_mat_2_b(~all(cellfun('isempty',pan_mat_2_b),2),:);
datetime_mat_b=str2double(string(datetime_mat_b));
pan_mat_2_b=str2double(pan_mat_2_b);
toc
Still not great, but better. If you want to speed this up further i recommend you take a closer look at the readtable part. As you can save up quite some time if you start with reading in the format as doubles right from the beginning
I'm attempting to count the number of letters in a text file, but unfortunately I keep getting stuck if numbers are involved.
So far I have been able to deal with letters and symbols, but unfortunately the ischar function doesn't help me when it comes to numbers.
function ok = lets(file_name)
fid = fopen(file_name, 'rt');
if fid < 0
ok = -1;
end
C = [];
D = [];
oneline = fgets(fid);
while ischar(oneline)
C = oneline(isletter(oneline));
W = length(C);
D = [D ; W];
oneline = fgets(fid);
end
total = 0;
for i = 1:length(D)
total = D(i) + total;
end
ok = total;
How can I deal with counting letters if there are also numbers in a text file?
I approached the problem the following way:
function ok = lets(file_name)
file = memmapfile( file_name, 'writable', false );
lowercase = [65:90];
uppercase = [97:122];
data = file.Data;
ok = sum(histc(data,lowercase)+histc(data,uppercase));
end
I mapped the file to memory using the memmapfile function and compared the data with the character encodings from this ASCII table. Lower case letters are represented by [65:90] and upper case letters by [97:122]. By applying the histc function, I got the frequency in which each letter appeared in the file. The total number of letters is given by adding all the frequencies up.
Note that I called histc twice to avoid having a bin from 90 to 97, which would count the []^_` characters.
I applied the function to a sample file called sample.txt containing the following lines:
abc23D&f![
k154&ยจ&skj
djaljaljds
Here is my output:
>> lets('sample.txt')
Elapsed time is 0.017783 seconds.
ans =
19
Edit:
Outputting ok=-1 for problems reading file:
function ok = lets(fclose(fid);file_name)
try
file = memmapfile( file_name, 'writable', false );
catch
file=[];
ok=-1;
end
if ~isempty(file)
lowercase = [65:90];
uppercase = [97:122];
data = file.Data;
ok = sum(histc(data,lowercase)+histc(data,uppercase));
end
end
With fopen approach, since you get the ok=-1 "by default":
function ok = lets(file_name)
fid = fopen(file_name, 'rt');
if fid < 0
ok = -1;
else
celldata=textscan(fid,'%s');
fclose(fid);
lowercase = [65:90];
uppercase = [97:122];
data = uint8([celldata{1}{:});
ok = sum(histc(data,lowercase)+histc(data,uppercase));
end
end
I think you are making this a lot more complected than it needs to be, just use isletter like you had and then use length.
function ok = lets(file_name)
%Original code as you had it
fid = fopen(file_name, 'rt');
if fid < 0
ok = -1;
end
%Initialize length
ok = 0;
%Get first line
oneline = fgets(fid);
%While line isn't empty
while oneline ~= -1
%remove everythin that's not a letter
oneline(~isletter(oneline)) = [];
%Add number of letters to output
ok = ok + length(oneline);
%Get next line
oneline = fgets(fid);
end
end
I used the input file,
Ar,TF,760,2.5e-07,1273.14,4.785688323049946e+24,24.80738364864047,37272905351.7263,37933372595.0276
Ar,TF,760,5e-07,1273.14,4.785688323049946e+24,40.3092219226107,2791140681.70926,2978668073.513113
Ar,TF,760,7.5e-07,1273.14,4.785688323049946e+24,54.80989010679312,738684259.1671219,836079550.0157251
and got 18, this counts the e's in the numbers, do you want these to be counted?
I'm tryng to read in a text file with Matlab.
The file is in this format:
string number number
string number number
....
I'd like to skip the lines which start with a specific string. For any other string, I want to save the two numbers in that line.
Let's take this sample file file.txt:
badstring 1 2
badstring 3 4
goodstring 5 6
badstring 7 8
goodstring 9 10
If a line starts with badstring we skip it, otherwise we store the two numbers following the string.
fid = fopen('file.txt');
nums = textscan(fid, '%s %f %f');
fclose(fid);
ind = find(strcmp(nums{1},'badstring'));
nums = cell2mat(nums(:,2:end));
nums(ind,:) = [];
display(nums)
This will read the entire file into a cell array, then convert it to a matrix (without the strings), and then kill any row which originally started with badstring. Alternatively, if the file is very large, you can avoid the temporary storage of all the lines with this iterative solution:
fid = fopen('file.txt');
line = fgetl(fid);
numbers = [];
while line ~= -1 % read file until EOF
line = textscan(line, '%s %f %f');
if ~strcmp(line{1}, 'badstring')
numbers = [numbers; line{2} line{3}];
end
line = fgetl(fid);
end
fclose(fid);
display(numbers)
This is a follow up question to
Reading parameters from a text file into the workspace
I am wondering, how would I read the following:
% ---------------------- details --------------------------
%---------------------------------------------------------------
% location:
lat = 54.35
lon = -2.9833
%
Eitan T suggested using:
fid = fopen(filename);
C = textscan(fid, '%[^= ]%*[= ]%f', 'CommentStyle', '%')
fclose(fid);
to obtain the information from the file and then
lat = C{2}(strcmp(C{1}, 'lat'));
lon = C{2}(strcmp(C{1}, 'lon'));
to obtain the relevant parameters. How could I alter this to read the following:
% ---------------------- details --------------------------
%---------------------------------------------------------------
% location:
lat = 54.35
lon = -2.9833
heights = 0, 2, 4, 7, 10, 13, 16, 19, 22, 25, 30, 35
Volume = 6197333, 5630000, 4958800, 4419400, 3880000, 3340600,
3146800, 2780200, 2413600, 2177000, 1696000, 811000
%
where the variable should contain all of the data points following the equal sign (up until the start of the next variable, Volume in this case)?
Thanks for your help
Here's one method, which uses some filthy string hacking and eval to get the result. This works on your example, but I wouldn't really recommend it:
fid = fopen('data.txt');
contents = textscan(fid, '%s', 'CommentStyle', '%', 'Delimiter', '\n');
contents = contents{1};
for ii = 1:length(contents)
line = contents{ii};
eval( [strrep(line, '=', '=['), '];'] ) # convert to valid Matlab syntax
end
A better method would be to read each of the lines using textscan
for ii = 1:length(contents)
idx = strfind(contents{ii}, ' = ');
vars{ii} = contents{ii}(1:idx-1);
vals(ii) = textscan(contents{ii}(idx+3:end), '%f', 'Delimiter', ',');
end
Now the variables vars and vals have the names of your variables, and their values. To extract the values you could do something like
ix = strmatch('lat', vars, 'exact');
lat = vals{ix};
ix = strmatch('lon', vars, 'exact');
lon = vals{ix};
ix = strmatch('heights', vars, 'exact');
heights = vals{ix};
ix = strmatch('Volume', vars, 'exact');
volume = vals{ix};
This can be accomplished using a 2-step approach:
Read the leading string (first element), equals sign (ignored), and the rest of the line as a string (second element)
Convert these strings-of-the-rest-of-the-lines to floats (second element)
There is however a slight drawback here; your lines seem to follow two formats; one is the one described in step 1), the other is a continuation of the previous line, which contains numbers.
Because of this, an extra step is required:
Read the leading string (first element), equals sign (ignored), and the rest of the line as a string (second element)
This will fail when the "other format" is encountered. Detect this, correct this, and continue
Convert these strings-of-the-rest-of-the-lines to floats (second element)
I think this will do the trick:
fid = fopen('data.txt');
C = [];
try
while ~feof(fid)
% Read next set of data, assuming the "first format"
C_new = textscan(fid, '%[^= ]%*[= ]%s', 'CommentStyle', '%', 'Delimiter', '');
C = [C; C_new]; %#ok
% When we have non-trivial data, check for the "second format"
if ~all(cellfun('isempty', C_new))
% Second format detected!
if ~feof(fid)
% get the line manually
D = fgetl(fid);
% Insert new data from fgetl
C{2}(end) = {[C{2}{end} C{1}{end} D]};
% Correct the cell
C{1}(end) = [];
end
else
% empty means we've reached the end
C = C(1:end-1,:);
end
end
fclose(fid);
catch ME
% Just to make sure to not have any file handles lingering about
fclose(fid);
throw(ME);
end
% convert second elements to floats
C{2} = cellfun(#str2num, C{2}, 'UniformOutput', false);
If you can get rid of the multi-line Volume line, what you have written is valid matlab. So, just invoke the parameter file as a matlab script using the run command.
run(scriptName)
Your only other alternative, as others have shown, is to write what will end up looking like a bastardized Matlab parser. There are definitely better ways to spend your time than doing that!
I'm trying to load the following ascii file into MATLAB using load()
% some comment
1 0xc661
2 0xd661
3 0xe661
(This is actually a simplified file. The actual file I'm trying to load contains an undefined number of columns and an undefined number of comment lines at the beginning, which is why the load function was attractive)
For some strange reason, I obtain the following:
K>> data = load('testMixed.txt')
data =
1 50785
2 58977
3 58977
I've observed that the problem occurs anytime there's a "d" in the hexadecimal number.
Direct hex2dec conversion works properly:
K>> hex2dec('d661')
ans =
54881
importdata seems to have the same conversion issue, and so does the ImportWizard:
K>> importdata('testMixed.txt')
ans =
1 50785
2 58977
3 58977
Is that a bug, am I using the load function in some prohibited way, or is there something obvious I'm overlooking?
Are there workarounds around the problem, save from reimplementing the file parsing on my own?
Edited my input file to better reflect my actual file format. I had a bit oversimplified in my original question.
"GOLF" ANSWER:
This starts with the answer from mtrw and shortens it further:
fid = fopen('testMixed.txt','rt');
data = textscan(fid,'%s','Delimiter','\n','MultipleDelimsAsOne','1',...
'CommentStyle','%');
fclose(fid);
data = strcat(data{1},{' '});
data = sscanf([data{:}],'%i',[sum(isspace(data{1})) inf]).';
PREVIOUS ANSWER:
My first thought was to use TEXTSCAN, since it has an option that allows you to ignore certain lines as comments when they start with a given character (like %). However, TEXTSCAN doesn't appear to handle numbers in hexadecimal format well. Here's another option:
fid = fopen('testMixed.txt','r'); % Open file
% First, read all the comment lines (lines that start with '%'):
comments = {};
position = 0;
nextLine = fgetl(fid); % Read the first line
while strcmp(nextLine(1),'%')
comments = [comments; {nextLine}]; % Collect the comments
position = ftell(fid); % Get the file pointer position
nextLine = fgetl(fid); % Read the next line
end
fseek(fid,position,-1); % Rewind to beginning of last line read
% Read numerical data:
nCol = sum(isspace(nextLine))+1; % Get the number of columns
data = fscanf(fid,'%i',[nCol inf]).'; % Note '%i' works for all integer formats
fclose(fid); % Close file
This will work for an arbitrary number of comments at the beginning of the file. The computation to get the number of columns was inspired by Jacob's answer.
New:
This is the best I could come up with. It should work for any number of comment lines and columns. You'll have to do the rest yourself if there are strings, etc.
% Define the characters representing the start of the commented line
% and the delimiter
COMMENT_START = '%%';
DELIMITER = ' ';
% Open the file
fid = fopen('testMixed.txt');
% Read each line till we reach the data
l = COMMENT_START;
while(l(1)==COMMENT_START)
l = fgetl(fid);
end
% Compute the number of columns
cols = sum(l==DELIMITER)+1;
% Split the first line
split_l = regexp(l,' ','split');
% Read all the data
A = textscan(fid,'%s');
% Compute the number of rows
rows = numel(A{:})/cols;
% Close the file
fclose(fid);
% Assemble all the data into a matrix of cell strings
DATA = [split_l ; reshape(A{:},[cols rows])']; %' adding this to make it pretty in SO
% Recognize each column and process accordingly
% by analyzing each element in the first row
numeric_data = zeros(size(DATA));
for i=1:cols
str = DATA(1,i);
% If there is no '0x' present
if isempty(findstr(str{1},'0x')) == true
% This is a number
numeric_data(:,i) = str2num(char(DATA(:,i)));
else
% This is a hexadecimal number
col = char(DATA(:,i));
numeric_data(:,i) = hex2dec(col(:,3:end));
end
end
% Display the data
format short g;
disp(numeric_data)
This works for data like this:
% Comment 1
% Comment 2
1.2 0xc661 10 0xa661
2 0xd661 20 0xb661
3 0xe661 30 0xc661
Output:
1.2 50785 10 42593
2 54881 20 46689
3 58977 30 50785
OLD:
Yeah, I don't think LOAD is the way to go. You could try:
a = char(importdata('testHexa.txt'));
a = hex2dec(a(:,3:end));
This is based on both gnovice's and Jacob's answers, and is a "best of breed"
For files like:
% this is my comment
% this is my other comment
1 0xc661 123
2 0xd661 456
% surprise comment
3 0xe661 789
4 0xb661 1234567
(where the number of columns within the file MUST be the same, but not known ahead of time, and all comments denoted by a '%' character), the following code is fast and easy to read:
f = fopen('hexdata.txt', 'rt');
A = textscan(f, '%s', 'Delimiter', '\n', 'MultipleDelimsAsOne', '1', 'CollectOutput', '1', 'CommentStyle', '%');
fclose(f);
A = A{1};
data = sscanf(A{1}, '%i')';
data = repmat(data, length(A), 1);
for ctr = 2:length(A)
data(ctr,:) = sscanf(A{ctr}, '%i')';
end