Variable Width Columns in .txt Files - matlab

I have a function that takes data and imports that data into a text file. The issue that I am having is with formatting. I want to be able to set the width of the columns based on the widest array of characters in that column. So, in the code below I have labels and then data. My idea would be to take the length of each individually and find the largest value. Say the second column labels has 15 chars and that is longer than any data array, then I want to set the width of that column to 15 + 3 (white spaces) making it 18. If column 3 had a max of 8 chars for a member of data, then I would like to set the width to 11. I have found plenty of literature on fixed width, and I found that I could do '-*s', *width, colLabels; but I am having difficulty figuring out how to implement that.
Below is my code and it doesn't fail but it takes forever and then won't open because there is not enough memory. I have really tried to work through this to no avail.
Thanks in advance and if there is any other information I can provide, then let me know.
for col = 1:length(this.colLabels) % iterate through columns
colLen = length(this.colLabels{col}); % find the longest string in labels
v = max(this.data(:,col)); % find the longest double in data
n = num2str(v, '%.4f'); % precision of 4 after decimal place
dataLen = length(n);
% find max width for column and add white space
if colLen > dataLen
colWidth = colLen + 3;
else
colWidth = dataLen + 3;
end
% print it
fprintf(fid, '%-*s', this.colWidth, this.colLabels{col}); % write col position i
fprintf(fid, '\n');
fprintf(fid, '%-*s', this.colWidth, this.colUnits{col});% write unit position i
fprintf(fid, '\n');
fprintf(fid, '%-*s', this.colWidth, this.data(:,col)); % write all rows of data in column i
end

There are few places where you are making mistakes:
The size of number is not necessarily related to its size when printed. Consider 1.1234 and 1000, one of these is a larger string and the other is a larger number. This may or may not matter for your data ...
Two, it is best to use the correct format strings when printing to string. %s is for strings, not numbers.
Perhaps most importantly, text appears on multiple lines because of the newline character which ends one line and starts another. This means you essentially have to write one row at a time, not one column at a time.
I tend to prefer creating the text in memory then writing to a file. The following isn't the cleanest implementation but it works.
this.colLabels = {'test' 'cheese' 'variable' 'really long string'};
this.colUnits = {'ml' 'cm' 'C' 'kg'};
n_columns = length(this.colLabels);
%Fake data
this.data = reshape(1:n_columns*5,5,n_columns);
this.data(1) = 1.2345678;
this.data(5) = 1000; %larger number but smaller string
%Format as desired ...
string_data = arrayfun(#(x) sprintf('%g',x),this.data,'un',0);
string_data = [this.colLabels; this.colUnits; string_data];
%Add on newlines ...
%In newer versions you can use newline instead of char(10)
string_data(:,end+1) = {char(10)};
string_lengths = cellfun('length',string_data);
max_col_widths = max(string_lengths,[],1);
%In newer versions you can use singleton expansion, but beware
n_spaces_add = bsxfun(#minus,max_col_widths,string_lengths);
%left justify filling with spaces
final_strings = cellfun(#(x,y) [x blanks(y)],string_data,num2cell(n_spaces_add),'un',0);
%Optional delimiter between columns
%Don't add delimiter for last column or for newline column
final_strings(:,1:end-2) = cellfun(#(x) [x ', '],final_strings(:,1:end-2),'un',0);
%Let's skip last newline
final_strings{end,end} = '';
%transpose for next line so that (:) goes by row first, not column
%Normally (:) linearizes by column first
final_strings = final_strings';
%concatenate all cells together
entire_string = [final_strings{:}];
%Write this to disk fprintf(fid,'%s',entire_string);

The data in the text file is stored one line after the other, so you cannot write column by column. You need first to determine the width of the columns and write the label/unit header, then write all the data. All we need to have is a proper format string for fprintf: fixed width format and fprintf is extremely useful for exporting column delimited data.
The first part of the code is ok in order to determine the width of the columns (assuming the data only has positive samples). You only need to store it in an array.
nCol=length(this.colLabels);
colWidth = zeros(1,nCol);
for col = 1:nCol
colLen = length(this.colLabels{col}); % find the longest string in labels
v = max(this.data(:,col)); % find the longest double in data
n = num2str(v, '%.4f'); % precision of 4 after decimal place
dataLen = length(n);
% find max width for column and add white space
colWidth(col)=max(colLen,dataLen);
end
Now, we need to build format string for the labels and data, to use with sprintf. The format string will look like '%6s %8s %10s\n' for the header and '%6.4f %8.4f %10.4f\n' for the data.
fmtHeader=sprintf('%%%ds ',colWidth);
fmtData=sprintf('%%%d.4f ',colWidth);
%Trim the triple space at the end and add the newline
fmtHeader=[fmtHeader(1:end-3) '\n'];
fmtData =[fmtData(1:end-3) '\n'];
We use the fact that, when sprintf is given an array as input, it will iterate through all the values to produce a long string. We can use the same trick to write the data, but singe we write line by line and Matlab stores data in column major order, a transpose is necessary.
fid=fopen('myFile.txt');
fprintf(fid,fmtHeader,this.colLabels{:});
fprintf(fid,fmtHeader,this.colUnits{:});
fprintf(fid,fmtData,transpose(this.data));
fclose(fid);
For the headers, the cell can be converted to a comma separated list with {:}. This is the same as writing fprintf(fid,fmtHeader,this.colLabels{1},this.colLabels{2},...)
Using the same test data from #Jimbo 's answer and fid=1; to output the fprintf to the screen the code gives:
test cheese variable really long string
ml cm C kg
1.2346 6.0000 11.0000 16.0000
2.0000 7.0000 12.0000 17.0000
3.0000 8.0000 13.0000 18.0000
4.0000 9.0000 14.0000 19.0000
1000.0000 10.0000 15.0000 20.0000
Finally, the most compact version of the code is:
fid=1; %print to screen for test purpose
colWidth =max( cellfun(#length,this.colLabels(:)') , max(1+floor(log10(max(this.data,[],1))) , 1) + 5); %log10 to count digits, +5 for the dot and decimal digits ; works for data >=0 only
fprintf(fid,[sprintf('%%%ds ',colWidth(1:end-1)) sprintf('%%%ds\n',colWidth(end))],this.colLabels{:},this.colUnits{:}); %print header
fprintf(fid,[sprintf('%%%d.4f ',colWidth(1:end-1)) sprintf('%%%d.4f\n',colWidth(end))],this.data'); %print data

Related

Optimizing reading the data in Matlab

I have a large data file with a text formatted as a single column with n rows. Each row is either a real number or a string with a value of: No Data. I have imported this text as a nx1 cell named Data. Now I want to filter out the data and to create a nx1 array out of it with NaN values instead of No data. I have managed to do it using a simple cycle (see below), the problem is that it is quite slow.
z = zeros(n,1);
for i = 1:n
if Data{i}(1)~='N'
z(i) = str2double(Data{i});
else
z(i) = NaN;
end
end
Is there a way to optimize it?
Actually, the whole parsing can be performed with a one-liner using a properly parametrized readtable function call (no iterations, no sanitization, no conversion, etc...):
data = readtable('data.txt','Delimiter','\n','Format','%f','ReadVariableNames',false,'TreatAsEmpty','No data');
Here is the content of the text file I used as a template for my test:
9.343410
11.54300
6.733000
-135.210
No data
34.23000
0.550001
No data
1.535000
-0.00012
7.244000
9.999999
34.00000
No data
And here is the output (which can be retrieved in the form of a vector of doubles using data.Var1):
ans =
9.34341
11.543
6.733
-135.21
NaN
34.23
0.550001
NaN
1.535
-0.00012
7.244
9.999999
34
NaN
Delimiter: specified as a line break since you are working with a single column... this prevents No data to produce two columns because of the whitespace.
Format: you want numerical values.
TreatAsEmpty: this tells the function to treat a specific string as empty, and empty doubles are set to NaN by default.
If you run this you can find out which approach is faster. It creates an 11MB text file and reads it with the various approaches.
filename = 'data.txt';
%% generate data
fid = fopen(filename,'wt');
N = 1E6;
for ct = 1:N
val = rand(1);
if val<0.01
fwrite(fid,sprintf('%s\n','No Data'));
else
fwrite(fid,sprintf('%f\n',val*1000));
end
end
fclose(fid)
%% Tommaso Belluzzo
tic
data = readtable(filename,'Delimiter','\n','Format','%f','ReadVariableNames',false,'TreatAsEmpty','No Data');
toc
%% Camilo Rada
tic
[txtMat, nLines]=txt2mat(filename);
NoData=txtMat(:,1)=='N';
z = zeros(nLines,1);
z(NoData)=nan;
toc
%% Gelliant
tic
fid = fopen(filename,'rt');
z= textscan(fid, '%f', 'Delimiter','\n', 'whitespace',' ', 'TreatAsEmpty','No Data', 'EndOfLine','\n','TextType','char');
z=z{1};
fclose(fid);
toc
result:
Elapsed time is 0.273248 seconds.
Elapsed time is 0.304987 seconds.
Elapsed time is 0.206315 seconds.
txt2mat is slow, even without converting resulting string matrix to numbers it is outperformed by readtable and textscan. textscan is slightly faster than readtable. Probably because it skips some of the internal sanity checks and does not convert the resulting data to a table.
Depending of how big are your files and how often you read such files, you might want to go beyond readtable, that could be quite slow.
EDIT: After tests, with a file this simple the method below provide no advantages. The method was developed to read RINEX files, that are large and complex in the sense that the are aphanumeric with different numbers of columns and different delimiters in different rows.
The most efficient way I've found, is to read the whole file as a char matrix, then you can easily find you "No data" lines. And if your real numbers are formatted with fix width you can transform them from char into numbers in a way much more efficient than str2double or similar functions.
The function I wrote to read a text file into a char matrix is:
function [txtMat, nLines]=txt2mat(filename)
% txt2mat Read the content of a text file to a char matrix
% Read all the content of a text file to a matrix as wide as the longest
% line on the file. Shorter lines are padded with blank spaces. New lines
% are not included in the output.
% New lines are identified by new line \n characters.
% Reading the whole file in a string
fid=fopen(filename,'r');
fileData = char(fread(fid));
fclose(fid);
% Finding new lines positions
newLines= fileData==sprintf('\n');
linesEndPos=find(newLines)-1;
% Calculating number of lines
nLines=length(linesEndPos);
% Calculating the width (number of characters) of each line
linesWidth=diff([-1; linesEndPos])-1;
% Number of characters per row including new lines
charsPerRow=max(linesWidth)+1;
% Initializing output var with blank spaces
txtMat=char(zeros(charsPerRow,nLines,'uint8')+' ');
% Computing a logical index to all characters of the input string to
% their final positions
charIdx=false(charsPerRow,nLines);
% Indexes of all new lines
linearInd = sub2ind(size(txtMat), (linesWidth+1)', 1:nLines);
charIdx(linearInd)=true;
charIdx=cumsum(charIdx)==0;
% Filling output matrix
txtMat(charIdx)=fileData(~newLines);
% Cropping the last row coresponding to new lines characters and transposing
txtMat=txtMat(1:end-1,:)';
end
Then, once you have all your data in a matrix (let's assume it is named txtMat), you can do:
NoData=txtMat(:,1)=='N';
And if your number fields have fix width, you can transform them to numbers way more efficiently than str2num with something like
values=((txtMat(:,1:10)-'0')*[1e6; 1e5; 1e4; 1e3; 1e2; 10; 1; 0; 1e-1; 1e-2]);
Where I've assumed the numbers have 7 digits and two decimal places, but you can easily adapt it for your case.
And to finish you need to set the NaN values with:
values(NoData)=NaN;
This is more cumbersome than readtable or similar functions, but if you are looking to optimize the reading, this is WAY faster. And if you don't have fix width numbers you can still do it this way by adding a couple lines to count the number of digits and find the place of the decimal point before doing the conversion, but that will slow down things a little bit. However, I think it will still be faster.

Skip all type of characters matlab while using fscanf or sscanf

I need to read a text that has mix of numerical values and characters. Here is an example:
% Loc : LAT = -19.6423 LON = -70.817 DEP = 21.5451196625
I only need to read numerical fields.
Typically I used this:
x = fgetl(fid);
out = sscanf(x,'%% Loc : LAT = %f LON = %f DEP = %f\n');
It works but the problem is that not all the files have fixed format and sometimes letters are written in upper or lower cases. In such cases what I did does not work.
I tried skipping all characters using
out = sscanf(x,'%[a-zA-Z.=\t\b]s %f %[a-zA-Z.=\t\b]s %f %[a-zA-Z.=\t\b]s %f\n');
but it does not work!
Please note that file lines are not the same and I have different number of numerical field in each line of file.
I'm still a little unclear on your file format, but it seems like you could do this much easier using textscan instead of the lower level functions.
Something like this should work:
while (~feof(fid))
textscan(fid, '%s :'); % Read the part of the line through the colon
data = textscan(fid, '%s = %f');
% Do something with the data here
end
The variable fid is an file identifier that you would have to have gotten from calling fopen and you'll need to call fclose when you're done.
I don't think this is going to exactly fix your problem, but hopefully it will get you on a track that's much shorter and cleaner. You'll have to play with this to make sure that you actually get to the end of file, for example, and that there are not corner cases that trip up the pattern matching.
*scanf() uses a format string like "%d", not a multi-character constant like '%d'
Detail: " vs. '.
"%[] does not use a trailing 's' as OP used in '%[a-zA-Z.=\t\b]s'
"%n" records the int count of characters scanned so far.
Suggest
// Adjust these as needed
#define SKIPCHAR " %*[a-zA-Z.%:]"
#define EQUALFLOAT " =%f"
int n = 0;
float lat, lon, dep;
sscanf(x, SKIPCHAR EQUALFLOAT SKIPCHAR EQUALFLOAT SKIPCHAR EQUALFLOAT " %n",
&lat, &lon, &dep, &n);
if (n > 0 && x[n] == '\0') Success();
else Fail();
To cope with different number of numbers in a line:
#define XN 100
float x[XN];
char *p = x;
size_t i;
for (i=0; i<XN; i++) {
int n = 0;
sscanf(p, SKIPCHAR " %n", &n);
p += n;
n = 0;
sscanf(p, EQUALFLOAT " %n", &x[i], &n);
if (n == 0) break;
p += n;
}
I've found a possible solution even if it is for sure not "elegant", nevertheless seems working.
It is based on the following process:
read the file line by line using fgets
parse each string using strtok
try converting each token to a numebr with str2num
if it is actually a "number" str2num (i. e. if str2num does not returns an empty array) insert the number in the output matrix
The output matrix is initialized (to NaN) at the beginning of the script as big enough to have:
a number of rows greater or equal to the number of rows of the input file (if it is not known in advance, a "reasonable" value should be defined)
a number of columns greater or equal to the maximum number of numeric values that can be present in a row of the input file (if it is not known in advance, a "reasonable" value should be defined).
Once you've read all the input file, you can "clean" the the output matrix by removing the exceeding full NaN rows and columns.
In the following you can find the script, the input file I've used and the output matrix (looking at it should make more clear the reason for having initialized it to NaN - I hope).
Notice that the identification of the number and their extraction (using strtok) is based on the format of your the example row: in particular, for example, it is based on the fact that all the token of the string are separated by a space.
This means that the code is not able to identify =123.456 as number.
If your input file has token such as =123.456, the code has to be modified.
% Initialize rows counter
r_cnt=0;
% INitialize column counter
c_cnt=0;
% Define the number of rows of the input file (if it not known in advance,
% put a "reasonable" value) - Used to initialize the output matrix
file_rows=5;
% Define the number of numeric values to be extracted from the input file
% (if it not known in advance, put a "reasonable" value) - Used to
% initialize the output matrix
max_col=5;
% Initialize the variable holding the maximum number of column. Used to
% "clean" the output matrix
max_n_col=-1;
% Initialize the output matrix
m=nan(file_rows,max_col);
% Open the input file
fp=fopen('char_and_num.txt','rt');
% Get the first row
tline = fgets(fp);
% Loop to read line by line the input file
while ischar(tline)
% Increment the row counter
r_cnt=r_cnt+1;
% Parse the line looking for numeric values
while(true)
[str, tline] = strtok(tline);
if(isempty(str))
break
end
% Try to conver the string into a number
tmp_val=str2num(str);
if(~isempty(tmp_val))
% If the token is a number, increment the column counter and
% insert the number in the output matrix
c_cnt=c_cnt+1;
m(r_cnt,c_cnt)=tmp_val;
end
end
% Identify the maximum number not NaN column in the in the output matrix
% so far
max_n_col=max(max_n_col,c_cnt);
% Reset the column counter before nest iteration
c_cnt=0;
% Read next line of the input file
tline = fgets(fp);
end
% After having read all the input file, close it
fclose(fp)
% Clean the output matrix removing the exceeding full NaN rows and columns
m(r_cnt+1:end,:)=[];
m(:,max_n_col+1:end)=[];
m
Input file
% Loc : LAT = -19.6423 LON = -70.817 DEP = 21.5451196625
% Loc : xxx = -1.234 yyy = -70.000 WIDTH = 333.369 DEP = 456.5451196625
% Loc : zzz = 1.23
Output
m =
-19.6423 -70.8170 21.5451 NaN
-1.2340 -70.0000 333.3690 456.5451
1.2300 NaN NaN NaN
Hope this helps.

Matlab Clipboard Precision - Format long -

I need to copy paste several matrix from matlab to excel so i did my researches and i've found a really amazing script called num2clip that brings the selected array to the clipboard.
The only problem is that the numbers format is short, when i would like it to be long.
I suspect the "double" type used in the script but i'm still new to matlab so i do have some important lacks.
Here is the script that i've found, what do i have to do according to you in order to keep the long input format ?
function arraystring = num2clip(array)
function arraystring = num2clip(array)
%NUM2CLIP copies a numerical-array to the clipboard
%
% ARRAYSTRING = NUM2CLIP(ARRAY)
%
% Copies the numerical array ARRAY to the clipboard as a tab-separated
% string. This format is suitable for direct pasting to Excel and other
% programs.
%
% The tab-separated result is returned as ARRAYSTRING. This
% functionality has been included for completeness.
%
%Author: Grigor Browning
%Last update: 02-Sept-2005
%convert the numerical array to a string array
%note that num2str pads the output array with space characters to account
%for differing numbers of digits in each index entry
arraystring = num2str(array);
%add a carrige return to the end of each row
arraystring(:,end+1) = char(10);
%reshape the array to a single line
%note that the reshape function reshape is column based so to reshape by
%rows one must use the inverse of the matrix
%reshape the array to a single line
arraystring = reshape(arraystring',1,prod(size(arraystring)));
%create a copy of arraystring shifted right by one space character
arraystringshift = [' ',arraystring];
%add a space to the end of arraystring to make it the same length as
%arraystringshift
arraystring = [arraystring,' '];
%now remove the additional space charaters - keeping a single space
%charater after each 'numerical' entry
arraystring = arraystring((double(arraystring)~=32 |...
double(arraystringshift)~=32) &...
~(double(arraystringshift==10) &...
double(arraystring)==32) );
%convert the space characters to tab characters
arraystring(double(arraystring)==32) = char(9);
format long e
%copy the result to the clipboard ready for pasting
clipboard('copy',arraystring);
Best regards.
Just replace the line with:
arraystring = num2str(array) ;
to a line like that:
arraystring = num2str(array,'%15.15f') ;
This will give you the maximum precision you can reach with the double type (15 digits).
Look at the num2str documentation for more custom format.
Thank you Hoki for your participation.
I didnt had the time to go through all the documentation wich is great by the way.
When i tried your solution, the copied data was all inserted on one cell, i just had to change :
arraystring = num2str(array,'%15.15f') ;
to
arraystring = num2str(array,15) ;
Have a nice day !

Converting a comma separated filed to a matlab matrix

I have a comma separated file in the format:
Col1Name,Col1Val1,Col1Val2,Col1Val3,...Col1ValN,Col2Name,Col2Val1,...Col2ValN,...,ColMName,ColMVal1,...,ColMValN
My question is, how can I convert this file into something Matlab can treat as a matrix, and how would I go about using this matrix in a file? I supposed I could some scripting language to format the file into matlab matrix format and copy it, but the file is rather large (~7mb).
Thanks!
Sorry for the edit:
The file format is:
Col1Name;Col2Name;Col3Name;...;ColNName
Col1Val1;Col2Val2;Col3Val3;...;ColNVal1
...
Col1ValM;Col2ValM;Col3ValM;...;VolNValM
Here is some actual data:
Press;Temp.;CondF;Cond20;O2%;O2ppm;pH;NO3;Chl(a);PhycoEr;PhycoCy;PAR;DATE;TIME;excel.date;date.time
0.96;20.011;432.1;431.9;125.1;11.34;8.999;134;9.2;2.53;1.85;16.302;08.06.2011;12:01:52;40702;40702.0.5
1;20.011;433;432.8;125;11.34;9;133.7;8.19;3.32;2.02;17.06;08.06.2011;12:01:54;40702;40702.0.5
1.1;20.012;432.7;432.4;125.1;11.34;9;133.8;8.35;2.13;2.2;19.007;08.06.2011;12:01:55;40702;40702.0.5
1.2;20.012;432.8;432.5;125.2;11.35;9.001;133.8;8.45;2.95;1.95;21.054;08.06.2011;12:01:56;40702;40702.0.5
1.3;20.012;432.7;432.4;125.4;11.37;9.002;133.7;8.62;3.17;1.87;22.934;08.06.2011;12:01:57;40702;40702.0.5
1.4;20.007;432.1;431.9;125.2;11.35;9.003;133.7;9.48;4.17;1.6;24.828;08.06.2011;12:01:58;40702;40702.0.5
1.5;19.997;432.3;432.2;124.9;11.33;9.003;133.8;8.5;3.84;1.79;27.327;08.06.2011;12:01:59;40702;40702.0.5
1.6;20;432.8;432.6;124.5;11.29;9.003;133.6;8.57;3.22;1.86;30.259;08.06.2011;12:02:00;40702;40702.0.5
1.7;19.99;431.9;431.9;124.4;11.28;9.002;133.6;8.79;3.7;1.81;35.152;08.06.2011;12:02:02;40702;40702.0.5
1.8;19.994;432.1;432.1;124.4;11.28;9.002;133.6;8.58;3.41;1.84;39.098;08.06.2011;12:02:03;40702;40702.0.5
1.9;19.993;433;432.9;124.6;11.3;9.002;133.6;8.59;3.45;5.53;45.488;08.06.2011;12:02:04;40702;40702.0.5
2;19.994;433;432.9;124.8;11.32;9.002;133.5;8.6;2.76;1.99;50.646;08.06.2011;12:02:05;40702;40702.0.5
If you don't know number of rows and columns up front, you can't use previous solution. Use this instead.
7 Mb is not large, it is small. This is the 21st century.
To read in to a matlab matrix:
text = fileread('file.name'); % a string with the entire file contents in it. 7 Mb is no big deal.
NAMES = {}; % we'll record column names here
VALUES = []; % this will be the matrix of values
while text(end) = ','
text(end)=[]; % elimnate any trailing commas
end
commas = find(text==','); % Index all the commas
commas = [0;commas(:);length(commas)+1] % put fake commas before and after text to simplify loop
col = 0; % which column are we in
I = 1;
while I<length(commas)
txt = text(commas(I)+1:commas(I+1)-1);
I = I+1;
num = str2double(txt);
if isnan(num) % this means it must be a column name
NAMES{end+1,1} = txt;
col = col+1; % can you believe Matlab doesn't support col++ ???
row = 1; % back to the top at each new column
continue % we have dealt with this txt, its not a num so ... next
end
% if we made it here we have a number
VALUES(row,col) = num;
end
Then you can save your matlab matrix VALUES and also the header names if you want them in matlab format NAMES into matlab format file
save('mymatrix.mat','VALUES','NAMES'); % saves matrix and column names to .mat file
You get stuff back in to matlab when you want it from the file by:
load mymatrix.mat; % loads VALUES and NAMES from .mat file
Some limitations:
You can't use commas in your column header names.
You cannot "name" a column something like "898.2" or anything which can be read as a double number, it will be read in as a number.
If your columns have different lengths, the shorter ones will be padded with zeros to the length of the longest column.
That's all I can think of.

Problem (bug?) loading hexadecimal data into MATLAB

I'm trying to load the following ascii file into MATLAB using load()
% some comment
1 0xc661
2 0xd661
3 0xe661
(This is actually a simplified file. The actual file I'm trying to load contains an undefined number of columns and an undefined number of comment lines at the beginning, which is why the load function was attractive)
For some strange reason, I obtain the following:
K>> data = load('testMixed.txt')
data =
1 50785
2 58977
3 58977
I've observed that the problem occurs anytime there's a "d" in the hexadecimal number.
Direct hex2dec conversion works properly:
K>> hex2dec('d661')
ans =
54881
importdata seems to have the same conversion issue, and so does the ImportWizard:
K>> importdata('testMixed.txt')
ans =
1 50785
2 58977
3 58977
Is that a bug, am I using the load function in some prohibited way, or is there something obvious I'm overlooking?
Are there workarounds around the problem, save from reimplementing the file parsing on my own?
Edited my input file to better reflect my actual file format. I had a bit oversimplified in my original question.
"GOLF" ANSWER:
This starts with the answer from mtrw and shortens it further:
fid = fopen('testMixed.txt','rt');
data = textscan(fid,'%s','Delimiter','\n','MultipleDelimsAsOne','1',...
'CommentStyle','%');
fclose(fid);
data = strcat(data{1},{' '});
data = sscanf([data{:}],'%i',[sum(isspace(data{1})) inf]).';
PREVIOUS ANSWER:
My first thought was to use TEXTSCAN, since it has an option that allows you to ignore certain lines as comments when they start with a given character (like %). However, TEXTSCAN doesn't appear to handle numbers in hexadecimal format well. Here's another option:
fid = fopen('testMixed.txt','r'); % Open file
% First, read all the comment lines (lines that start with '%'):
comments = {};
position = 0;
nextLine = fgetl(fid); % Read the first line
while strcmp(nextLine(1),'%')
comments = [comments; {nextLine}]; % Collect the comments
position = ftell(fid); % Get the file pointer position
nextLine = fgetl(fid); % Read the next line
end
fseek(fid,position,-1); % Rewind to beginning of last line read
% Read numerical data:
nCol = sum(isspace(nextLine))+1; % Get the number of columns
data = fscanf(fid,'%i',[nCol inf]).'; % Note '%i' works for all integer formats
fclose(fid); % Close file
This will work for an arbitrary number of comments at the beginning of the file. The computation to get the number of columns was inspired by Jacob's answer.
New:
This is the best I could come up with. It should work for any number of comment lines and columns. You'll have to do the rest yourself if there are strings, etc.
% Define the characters representing the start of the commented line
% and the delimiter
COMMENT_START = '%%';
DELIMITER = ' ';
% Open the file
fid = fopen('testMixed.txt');
% Read each line till we reach the data
l = COMMENT_START;
while(l(1)==COMMENT_START)
l = fgetl(fid);
end
% Compute the number of columns
cols = sum(l==DELIMITER)+1;
% Split the first line
split_l = regexp(l,' ','split');
% Read all the data
A = textscan(fid,'%s');
% Compute the number of rows
rows = numel(A{:})/cols;
% Close the file
fclose(fid);
% Assemble all the data into a matrix of cell strings
DATA = [split_l ; reshape(A{:},[cols rows])']; %' adding this to make it pretty in SO
% Recognize each column and process accordingly
% by analyzing each element in the first row
numeric_data = zeros(size(DATA));
for i=1:cols
str = DATA(1,i);
% If there is no '0x' present
if isempty(findstr(str{1},'0x')) == true
% This is a number
numeric_data(:,i) = str2num(char(DATA(:,i)));
else
% This is a hexadecimal number
col = char(DATA(:,i));
numeric_data(:,i) = hex2dec(col(:,3:end));
end
end
% Display the data
format short g;
disp(numeric_data)
This works for data like this:
% Comment 1
% Comment 2
1.2 0xc661 10 0xa661
2 0xd661 20 0xb661
3 0xe661 30 0xc661
Output:
1.2 50785 10 42593
2 54881 20 46689
3 58977 30 50785
OLD:
Yeah, I don't think LOAD is the way to go. You could try:
a = char(importdata('testHexa.txt'));
a = hex2dec(a(:,3:end));
This is based on both gnovice's and Jacob's answers, and is a "best of breed"
For files like:
% this is my comment
% this is my other comment
1 0xc661 123
2 0xd661 456
% surprise comment
3 0xe661 789
4 0xb661 1234567
(where the number of columns within the file MUST be the same, but not known ahead of time, and all comments denoted by a '%' character), the following code is fast and easy to read:
f = fopen('hexdata.txt', 'rt');
A = textscan(f, '%s', 'Delimiter', '\n', 'MultipleDelimsAsOne', '1', 'CollectOutput', '1', 'CommentStyle', '%');
fclose(f);
A = A{1};
data = sscanf(A{1}, '%i')';
data = repmat(data, length(A), 1);
for ctr = 2:length(A)
data(ctr,:) = sscanf(A{ctr}, '%i')';
end