Matlab: How to read commented portion of ascii file - matlab

I have an ascii file whose first couple hundred lines are commented (followed by the data) that give some information about the data. For example these are couple of lines I snipped out from large number of lines which are commented:
Right now I am only reading the data without comments by using load as:
filename = uigetfile('*.dat', 'Select Input data');
Data = load(filename, '-ascii');
How can I read the commented lines (which end just before the data starts) and pick some comments out of all comments based on some identifications such as Program name and version, Creation date etc. ?

Use textscan to read the lines into a cell array:
fid = fopen(filename, 'r');
C = textscan(fid, '%s', 'Delimiter', '\n');
C = C{:}; %// Flatten cell array
fclose(fid);
Now you can use regexp to manipulate the textual data. For instance, to find the comment lines that contain the string "Creation date", you can do this:
idx = ~cellfun('isempty', regexp(C, "^\s*%.*Creation date"));
where "^\s*% matches the percent sign (%) at the beginning of the line along with any leading whitespace, and the .* matches any number of characters until the occurrence of "Creation date". Needless to say, you can adjust the regular expression pattern to your liking.
The resulting variable idx stores a logical (i.e boolean) vector with "1"s at the positions of the lines matching the pattern (you can obtain their explicit numerical indices with find(idx)). Next you can filter those lines with C(idx) or iterate over them with a for loop.

fid = fopen(filename);
nHeaderRows = 412;
headerCell = cell(nHeaderRows, 1);
for i=1:nHeaderRows
headerCell{i} = fgets(fid);
end
headerText = char(headerCell);

Related

Convert .csv to .out (complex numbers)

I have a csv file that has complex numbers.
This is sample of some numbers I have in the csv file:
(0.12825663763789857+0.20327998150393212j),(0.21890748607218197+0.160563964013564j),(0.28205414129281525+0.09884068776334366j),(0.030927026479380615+0.26334550583848626j)
I want to read this file and then save in (.out) file all the real parts in the first column and all the imaginary parts in the second column (without the imaginary letter j).
Here is one attempt. It is slightly more complicated due to the ( and ) that surround your numbers.
First, use textscan to read the file. Since I guess you don't know how many numers are in the file, read everything into a singe string. Will work with mutiple lines, too:
filename = 'data.csv';
fid = fopen(filename);
content = textscan(fid, '%s');
fclose(fid);
For this purpose, content now is a slightly weird cell array (look at the textscan-docs for details). Just initialize the variable nums which will store the numbers and loop through content (if you know a bit more about your csv file, you might pre-allocate nums):
nums = [];
for c1 = 1:numel(content{1})
Next, split the string at every occurence of ,:
string_list = strsplit(content{1}{c1},',');
This gives another cell array. Loop through it to convert the strings to numbers (and end the outer loop):
for c2 = 1 : numel(string_list)
nums(end+1) = str2num(string_list{c2});
end
end
Last, just store the real and the imaginary part of the numbers in separate columns:
out = [];
out(:,1) = real(nums);
out(:,2) = imag(nums);
and save it to data.out.
Update As you mentioned precision, you could use
dlmwrite('data.out', out, 'precision','%.20f');
However, here you need to understand the floating point representation in Matlab. In particular, try to understand the following:
>> a = 0.12825663763789857
a =
0.1283
>> fprintf('%.20f\n', a)
0.12825663763789857397
>> eps(a)
ans =
2.7756e-17
Note that one could have done this without cenverting the strings to numbers, but the way above would allow you to use the data in Matlab instead of just saving it.
HEre is an attempt without converting your strings to numbers, therefore one does not have to deal with precision. It works with negative real and imaginary numbers, too. + signs are removed when written to the new file, - signs are preserved:
filename = 'data.csv';
fid = fopen(filename);
content = textscan(fid, '%s');
fclose(fid);
fid = fopen('data.out','w');
pattern = '(?<real>-{0,1}\d+.\d+)(?<imag>[+-]\d+.\d+)j';
for c1 = 1:numel(content{1})
result = regexp(content{1}{c1}, pattern, 'names');
for c2 = 1:numel(result)
fprintf(fid, '%s,%s\n', strrep(result(c2).real,'+',''), strrep(result(c2).imag,'+',''));
end
end
fclose(fid);

String vector to array

I am trying to make a script in Matlab that pulls data from a file and generates an array of data. Since the data is a string I've tried to split it into columns, take the transpose, and split it into columns again to populate an array.
When I run the script I don't get any errors, but I also don't get any useful data. I tell it to display the final vector (Full_Array) and I get {1×4 cell} 8 times. When I try to use strsplit I get the error:
'Error using strsplit (line 80) First input must be either a character vector or a string scalar.'
I'm pretty new to Matlab and I honestly have no clue how to fix it after reading through similar threads and the documentation I'm out of ideas. I've attached the code and the data to read in below. Thank you.
clear
File_Name = uigetfile; %Brings up windows file browser to locate .xyz file
Open_File = fopen(File_Name); %Opens the file given by File_Name
File2Vector = fscanf(Open_File,'%s'); %Prints the contents of the file to a 1xN vector
Vector2ColumnArray = strsplit(File2Vector,';'); %Splits the string vector from
%File2Vector into columns, forming an array
Transpose = transpose(Vector2ColumnArray); %Takes the transpose of Vector2ColumnArray
%making a column array into a row array
FullArray = regexp(Transpose, ',', 'split');
The data I am trying to read in comes from a .xyz file that I have titled methylformate.xyz, here is the data:
O2,-0.23799,0.65588,-0.69492;
O1,0.50665,0.83915,1.47685;
C2,-0.32101,2.08033,-0.75096;
C1,0.19676,0.17984,0.49796;
H4,0.66596,2.52843,-0.59862;
H3,-0.67826,2.36025,-1.74587;
H2,-1.03479,2.45249,-0.00927;
H1,0.23043,-0.91981,0.45346;
When I started using Matlab I also had problems with the data structure. The last line
FullArray = regexp(Transpose, ',', 'split');
splits each line and stores it in a cell array. In order to access the individual strings you have to index with curly brackets into FullArray:
FullArray{1}{1} % -> 'O2'
FullArray{1}{2} % -> '-0.23799'
FullArray{2}{1} % -> 'O1'
FullArray{2}{2} % -> '0.50665'
Thereby the first number corresponds to the row and the second to the particular element in the row.
However, there are easier functions in Matlab which load text files based on regular expressions.
Usually, the easiest function for reading mixed data is readtable.
data = readtable('methylformate.txt');
However, in your case this is more complex because
readtable can't cope with .xyz files, so you'd have to copy to .txt
The semi-colons confuse the read and make the last column characters
You can loop through each row and use textscan like so:
fid = fopen('methylformate.xyz');
tline = fgetl(fid);
myoutput = cell(0,4);
while ischar(tline)
myoutput(end+1,:) = textscan(tline, '%s %f %f %f %*[^\n]', 'delim', ',');
tline = fgetl(fid);
end
fclose(fid);
Output is a cell array of strings or doubles (as appropriate).

Matlab - string containing a number and equal sign

I have a data file that contains parameter names and values with an equal sign in between them. It's like this:
A = 1234
B = 1353.335
C =
D = 1
There is always one space before and after the equal sign. The problem is some variables don't have values assigned to them like "C" above and I need to weed them out.
I want to read the data file (text) into a cell and just remove the lines with those invalid statements or just create a new data file without them.
Whichever is easier, but I will eventually read the file into a cell with textscan command.
The values (numbers) will be treated as double precision.
Please, help.
Thank you,
Eric
Try this:
fid = fopen('file.txt'); %// open file
x = textscan(fid, '%s', 'delimiter', '\n'); %// or '\r'. Read each line into a cell
fclose(fid); %// close file
x = x{1}; %// each cell of x contains a line of the file
ind = ~cellfun(#isempty, regexp(x, '=\s[\d\.]+$')); %// desired lines: space, numbers, end
x = x(ind); %// keep only those lines
If you just want to get the variables, and reject lines that do not have any character, this might work (the data.txt is just a txt generated by the example of data you have given):
fid = fopen('data.txt');
tline = fgets(fid);
while ischar(tline)
tmp = cell2mat(regexp(tline,'\=(.*)','match'));
b=str2double(tmp(2:end));
if ~isnan(b)
disp(b)
end
tline = fgets(fid);
end
fclose(fid);
I am reading the txt file line by line, and using general expressions to get rid of useless chars, and then converting to double the value read.

Textscan until end of line

I'm trying to textscan a file and read a single line until the end of it, undependently of the number of elements in that line.
My file is a .txt file formatted like this :
602,598,302,456,1023,523,....
293,291,566,331,987,56,....
589,202,429,2911,294,567,...
And so on. I have the number of the line, and all lines have the same number of elements, but it can vary from one file to another.
I wrote something like:
fid = fopen('somefile.txt');
C = textscan(fid, formatSpec,'HeaderLines',Row-1);
TheLine = C{1};
fclose(fid);
X = numel(TheLine);
plot(1:X,TheLine);
I really don't know what to type in the formatSpec field. I've tried a few things in the way of %[^\n] but I didn't get much sucess.
Try this -
C = textscan(fid, '%d,','HeaderLines',Row-1);
Row will specify the row of data that you want to extract from the text file.

Text Scanning to read in unknown number of variables and unknown number of runs

I am trying to read in a csv file which will have the format
Var1 Val1A Val1B ... Val1Q
Var2 Val2A Val2B ... Val2Q
...
And I will not know ahead of time how many variables (rows) or how many runs (columns) will be in the file.
I have been trying to get text scan to work but no matter what I try I cannot get either all the variable names isolated or a rows by columns cell array. This is what I've been trying.
fID = fopen(strcat(pwd,'/',inputFile),'rt');
if fID == -1
disp('Could not find file')
return
end
vars = textscan(fID, '%s,%*s','delimiter','\n');
fclose(fID);
Does anyone have a suggestion?
If the file has the same number of columns in each row (you just don't know how many to begin with), try the following.
First, figure out how many columns by parsing just the first row and find the number of columns, then parse the full file:
% Open the file, get the first line
fid = fopen('myfile.txt');
line = fgetl(fid);
fclose(fid);
tmp = textscan(line, '%s');
% The length of tmp will tell you how many lines
n = length(tmp);
% Now scan the file
fid = fopen('myfile.txt');
tmp = textscan(fid, repmat('%s ', [1, n]));
fclose(fid);
For any given file, are all the lines equal length? If they are, you could start by reading in the first line and use that to count the number of fields and then use textscan to read in the file.
fID = fopen(strcat(pwd,'/',inputFile),'rt');
firstLine = fgetl(fID);
numFields = length(strfind(firstLine,' ')) + 1;
fclose(fID);
formatString = repmat('%s',1,numFields);
fID = fopen(strcat(pwd,'/',inputFile),'rt');
vars = textscan(fID, formatString,' ');
fclose(fID);
Now you will have a cell array where first entry are the var names and all the other entries are the observations.
In this case I assumed the delimiter was space even though you said it was a csv file. If it is really commas, you can change the code accordingly.