Reading parameters from a text file into the workspace - matlab

I have a file which has the following information:
% ---------------------- location details --------------------------
%
% lat : latitude [minimum = -90, maximum = 90, unit =
% degrees north]
% lon : longitude [ minimum = -360, maximum = 360, unit =
% deg east]
% z: altitude (above sea level, m)
%---------------------------------------------------------------
% location:
lat = 54.35
lon = -2.9833
This is a small section of the file.
I would like to read some of this information into MATLAB, where the information can then be used to perform some calculations. The part of the file that I would like to read into MATLAB are those in the text file that are not commented, i.e have a % at the start of the line, and the variable should then be saved in the workspace. For example, I would like to have:
lat = 54.35
lon = -2.9833
in the workspace.
How would I go about this? I have read about textscan and fopen, although these don't really seem to help me in this instance.

The quick-and-dirty approach
The simplest solution I could think of to read this file indeed employs textscan :) and since the lines are conveniently written in valid MATLAB syntax, you could use eval later to evaluate them. Start by reading each line as one string (ignoring the comments in the header)
fid = fopen(filename);
C = textscan(fid, '%s', 'Delimiter', '', 'CommentStyle', '%')
fclose(fid);
Then feed the lines one by one into eval to create the variables in the MATLAB workspace:
cellfun(#eval, C{1});
What this does is interpret the line as a MATLAB command, i.e create variables as named in the file and assign the appropriate values. If you want to suppress the output of eval, you can use evalc instead to "absorb the output":
cellfun(#evalc, C{1}, 'UniformOutput', false);
This should work for your basic example, but it would fail if you have more than one instance of any parameter. Also note that the eval family is notoriously slow.
A more robust approach
If the lines in your file structure have the parameter name = number pattern, you can read the lines more intelligently:
fid = fopen(filename);
C = textscan(fid, '%[^= ]%*[= ]%f', 'CommentStyle', '%')
fclose(fid);
The %[^= ] in the pattern matches the first characters until the first space or equality sign. The %*[ =] ignores the equality sign and any trailing spaces, and then the numerical value is matched with %f. The resulting cell array C stores the parameter names in the first cell and their corresponding values in the second cell.
Now it's up to you to manipulate the parsed data. For instance, to extract all values of lat and lon, you can do this:
lat = C{2}(strcmp(C{1}, 'lat'));
lon = C{2}(strcmp(C{1}, 'lon'));
If you have more than one "lat" line, lat will be an array holding all these values.

Here's another quick and dirty way:
fp = fopen('foo.txt');
found = 1;
while ~feof(fp)
line = fgetl(fp);
if (line(1) ~= '%') && ischar(line)
value(found) = sscanf(line,'%*s %*s %f');
found = found + 1;
end
end
The %*s skips the 'lat' or 'long' and the '='.

The example you provided is kinda well-behaved, therefore the following solution might need some tailoring. However, I would recommend it against any eval():
% Read whole file ignoring lines that start with '%' and using '=' as delimiter
fid = fopen('test.txt');
s = textscan(fid,'%s%f', 'CommentStyle','%','Delimiter','=');
fclose(fid);
% Identify lines with latitude and those with longitude
idxLat = strncmpi('lat',s{1},3);
idxLon = strncmpi('lon',s{1},3);
% Store all latitudes and longitudes
lat = s{2}(idxLat);
lon = s{2}(idxLon);

Gets you a structure with field names matching parameter names, accepts comma-separated lists. List any parameters that should stay as strings in char_params
char_params={};
fid = fopen(filename);
% Load lines into cell (1x1) containing cell array s (Nx1),
% skipping lines starting with % and cutting off anything after % in a line
s = textscan(fid,'%s', 'CommentStyle','%','Delimiter','%');
fclose(fid);
% access the lines strings s{1}, split across '=' and remove whitespace on both sides
s=strtrim(split(s{1},'='));
% Interpret parameters and save to structure
for ind=1:length(s)
% User says which parameters are strings
if any(strcmpi(s{ind,1},char_params))
file_struct.(s{ind,1})=s{ind,2};
% Otherwise, assume they are numbers or numeric row arrays
else
% remove parentheses and brackets
trim_s=regexprep(s(ind,2),'[[]()]','');
% convert comma-separated lists into row arrays
file_struct.(s{ind,1})=str2double(split(trim_s{1},',')).';
end
end

Related

Convert .csv to .out (complex numbers)

I have a csv file that has complex numbers.
This is sample of some numbers I have in the csv file:
(0.12825663763789857+0.20327998150393212j),(0.21890748607218197+0.160563964013564j),(0.28205414129281525+0.09884068776334366j),(0.030927026479380615+0.26334550583848626j)
I want to read this file and then save in (.out) file all the real parts in the first column and all the imaginary parts in the second column (without the imaginary letter j).
Here is one attempt. It is slightly more complicated due to the ( and ) that surround your numbers.
First, use textscan to read the file. Since I guess you don't know how many numers are in the file, read everything into a singe string. Will work with mutiple lines, too:
filename = 'data.csv';
fid = fopen(filename);
content = textscan(fid, '%s');
fclose(fid);
For this purpose, content now is a slightly weird cell array (look at the textscan-docs for details). Just initialize the variable nums which will store the numbers and loop through content (if you know a bit more about your csv file, you might pre-allocate nums):
nums = [];
for c1 = 1:numel(content{1})
Next, split the string at every occurence of ,:
string_list = strsplit(content{1}{c1},',');
This gives another cell array. Loop through it to convert the strings to numbers (and end the outer loop):
for c2 = 1 : numel(string_list)
nums(end+1) = str2num(string_list{c2});
end
end
Last, just store the real and the imaginary part of the numbers in separate columns:
out = [];
out(:,1) = real(nums);
out(:,2) = imag(nums);
and save it to data.out.
Update As you mentioned precision, you could use
dlmwrite('data.out', out, 'precision','%.20f');
However, here you need to understand the floating point representation in Matlab. In particular, try to understand the following:
>> a = 0.12825663763789857
a =
0.1283
>> fprintf('%.20f\n', a)
0.12825663763789857397
>> eps(a)
ans =
2.7756e-17
Note that one could have done this without cenverting the strings to numbers, but the way above would allow you to use the data in Matlab instead of just saving it.
HEre is an attempt without converting your strings to numbers, therefore one does not have to deal with precision. It works with negative real and imaginary numbers, too. + signs are removed when written to the new file, - signs are preserved:
filename = 'data.csv';
fid = fopen(filename);
content = textscan(fid, '%s');
fclose(fid);
fid = fopen('data.out','w');
pattern = '(?<real>-{0,1}\d+.\d+)(?<imag>[+-]\d+.\d+)j';
for c1 = 1:numel(content{1})
result = regexp(content{1}{c1}, pattern, 'names');
for c2 = 1:numel(result)
fprintf(fid, '%s,%s\n', strrep(result(c2).real,'+',''), strrep(result(c2).imag,'+',''));
end
end
fclose(fid);

String vector to array

I am trying to make a script in Matlab that pulls data from a file and generates an array of data. Since the data is a string I've tried to split it into columns, take the transpose, and split it into columns again to populate an array.
When I run the script I don't get any errors, but I also don't get any useful data. I tell it to display the final vector (Full_Array) and I get {1×4 cell} 8 times. When I try to use strsplit I get the error:
'Error using strsplit (line 80) First input must be either a character vector or a string scalar.'
I'm pretty new to Matlab and I honestly have no clue how to fix it after reading through similar threads and the documentation I'm out of ideas. I've attached the code and the data to read in below. Thank you.
clear
File_Name = uigetfile; %Brings up windows file browser to locate .xyz file
Open_File = fopen(File_Name); %Opens the file given by File_Name
File2Vector = fscanf(Open_File,'%s'); %Prints the contents of the file to a 1xN vector
Vector2ColumnArray = strsplit(File2Vector,';'); %Splits the string vector from
%File2Vector into columns, forming an array
Transpose = transpose(Vector2ColumnArray); %Takes the transpose of Vector2ColumnArray
%making a column array into a row array
FullArray = regexp(Transpose, ',', 'split');
The data I am trying to read in comes from a .xyz file that I have titled methylformate.xyz, here is the data:
O2,-0.23799,0.65588,-0.69492;
O1,0.50665,0.83915,1.47685;
C2,-0.32101,2.08033,-0.75096;
C1,0.19676,0.17984,0.49796;
H4,0.66596,2.52843,-0.59862;
H3,-0.67826,2.36025,-1.74587;
H2,-1.03479,2.45249,-0.00927;
H1,0.23043,-0.91981,0.45346;
When I started using Matlab I also had problems with the data structure. The last line
FullArray = regexp(Transpose, ',', 'split');
splits each line and stores it in a cell array. In order to access the individual strings you have to index with curly brackets into FullArray:
FullArray{1}{1} % -> 'O2'
FullArray{1}{2} % -> '-0.23799'
FullArray{2}{1} % -> 'O1'
FullArray{2}{2} % -> '0.50665'
Thereby the first number corresponds to the row and the second to the particular element in the row.
However, there are easier functions in Matlab which load text files based on regular expressions.
Usually, the easiest function for reading mixed data is readtable.
data = readtable('methylformate.txt');
However, in your case this is more complex because
readtable can't cope with .xyz files, so you'd have to copy to .txt
The semi-colons confuse the read and make the last column characters
You can loop through each row and use textscan like so:
fid = fopen('methylformate.xyz');
tline = fgetl(fid);
myoutput = cell(0,4);
while ischar(tline)
myoutput(end+1,:) = textscan(tline, '%s %f %f %f %*[^\n]', 'delim', ',');
tline = fgetl(fid);
end
fclose(fid);
Output is a cell array of strings or doubles (as appropriate).

Matlab: How to read commented portion of ascii file

I have an ascii file whose first couple hundred lines are commented (followed by the data) that give some information about the data. For example these are couple of lines I snipped out from large number of lines which are commented:
Right now I am only reading the data without comments by using load as:
filename = uigetfile('*.dat', 'Select Input data');
Data = load(filename, '-ascii');
How can I read the commented lines (which end just before the data starts) and pick some comments out of all comments based on some identifications such as Program name and version, Creation date etc. ?
Use textscan to read the lines into a cell array:
fid = fopen(filename, 'r');
C = textscan(fid, '%s', 'Delimiter', '\n');
C = C{:}; %// Flatten cell array
fclose(fid);
Now you can use regexp to manipulate the textual data. For instance, to find the comment lines that contain the string "Creation date", you can do this:
idx = ~cellfun('isempty', regexp(C, "^\s*%.*Creation date"));
where "^\s*% matches the percent sign (%) at the beginning of the line along with any leading whitespace, and the .* matches any number of characters until the occurrence of "Creation date". Needless to say, you can adjust the regular expression pattern to your liking.
The resulting variable idx stores a logical (i.e boolean) vector with "1"s at the positions of the lines matching the pattern (you can obtain their explicit numerical indices with find(idx)). Next you can filter those lines with C(idx) or iterate over them with a for loop.
fid = fopen(filename);
nHeaderRows = 412;
headerCell = cell(nHeaderRows, 1);
for i=1:nHeaderRows
headerCell{i} = fgets(fid);
end
headerText = char(headerCell);

matlab reading variables with varying lengths into the workspace

This is a follow up question to
Reading parameters from a text file into the workspace
I am wondering, how would I read the following:
% ---------------------- details --------------------------
%---------------------------------------------------------------
% location:
lat = 54.35
lon = -2.9833
%
Eitan T suggested using:
fid = fopen(filename);
C = textscan(fid, '%[^= ]%*[= ]%f', 'CommentStyle', '%')
fclose(fid);
to obtain the information from the file and then
lat = C{2}(strcmp(C{1}, 'lat'));
lon = C{2}(strcmp(C{1}, 'lon'));
to obtain the relevant parameters. How could I alter this to read the following:
% ---------------------- details --------------------------
%---------------------------------------------------------------
% location:
lat = 54.35
lon = -2.9833
heights = 0, 2, 4, 7, 10, 13, 16, 19, 22, 25, 30, 35
Volume = 6197333, 5630000, 4958800, 4419400, 3880000, 3340600,
3146800, 2780200, 2413600, 2177000, 1696000, 811000
%
where the variable should contain all of the data points following the equal sign (up until the start of the next variable, Volume in this case)?
Thanks for your help
Here's one method, which uses some filthy string hacking and eval to get the result. This works on your example, but I wouldn't really recommend it:
fid = fopen('data.txt');
contents = textscan(fid, '%s', 'CommentStyle', '%', 'Delimiter', '\n');
contents = contents{1};
for ii = 1:length(contents)
line = contents{ii};
eval( [strrep(line, '=', '=['), '];'] ) # convert to valid Matlab syntax
end
A better method would be to read each of the lines using textscan
for ii = 1:length(contents)
idx = strfind(contents{ii}, ' = ');
vars{ii} = contents{ii}(1:idx-1);
vals(ii) = textscan(contents{ii}(idx+3:end), '%f', 'Delimiter', ',');
end
Now the variables vars and vals have the names of your variables, and their values. To extract the values you could do something like
ix = strmatch('lat', vars, 'exact');
lat = vals{ix};
ix = strmatch('lon', vars, 'exact');
lon = vals{ix};
ix = strmatch('heights', vars, 'exact');
heights = vals{ix};
ix = strmatch('Volume', vars, 'exact');
volume = vals{ix};
This can be accomplished using a 2-step approach:
Read the leading string (first element), equals sign (ignored), and the rest of the line as a string (second element)
Convert these strings-of-the-rest-of-the-lines to floats (second element)
There is however a slight drawback here; your lines seem to follow two formats; one is the one described in step 1), the other is a continuation of the previous line, which contains numbers.
Because of this, an extra step is required:
Read the leading string (first element), equals sign (ignored), and the rest of the line as a string (second element)
This will fail when the "other format" is encountered. Detect this, correct this, and continue
Convert these strings-of-the-rest-of-the-lines to floats (second element)
I think this will do the trick:
fid = fopen('data.txt');
C = [];
try
while ~feof(fid)
% Read next set of data, assuming the "first format"
C_new = textscan(fid, '%[^= ]%*[= ]%s', 'CommentStyle', '%', 'Delimiter', '');
C = [C; C_new]; %#ok
% When we have non-trivial data, check for the "second format"
if ~all(cellfun('isempty', C_new))
% Second format detected!
if ~feof(fid)
% get the line manually
D = fgetl(fid);
% Insert new data from fgetl
C{2}(end) = {[C{2}{end} C{1}{end} D]};
% Correct the cell
C{1}(end) = [];
end
else
% empty means we've reached the end
C = C(1:end-1,:);
end
end
fclose(fid);
catch ME
% Just to make sure to not have any file handles lingering about
fclose(fid);
throw(ME);
end
% convert second elements to floats
C{2} = cellfun(#str2num, C{2}, 'UniformOutput', false);
If you can get rid of the multi-line Volume line, what you have written is valid matlab. So, just invoke the parameter file as a matlab script using the run command.
run(scriptName)
Your only other alternative, as others have shown, is to write what will end up looking like a bastardized Matlab parser. There are definitely better ways to spend your time than doing that!

Problem (bug?) loading hexadecimal data into MATLAB

I'm trying to load the following ascii file into MATLAB using load()
% some comment
1 0xc661
2 0xd661
3 0xe661
(This is actually a simplified file. The actual file I'm trying to load contains an undefined number of columns and an undefined number of comment lines at the beginning, which is why the load function was attractive)
For some strange reason, I obtain the following:
K>> data = load('testMixed.txt')
data =
1 50785
2 58977
3 58977
I've observed that the problem occurs anytime there's a "d" in the hexadecimal number.
Direct hex2dec conversion works properly:
K>> hex2dec('d661')
ans =
54881
importdata seems to have the same conversion issue, and so does the ImportWizard:
K>> importdata('testMixed.txt')
ans =
1 50785
2 58977
3 58977
Is that a bug, am I using the load function in some prohibited way, or is there something obvious I'm overlooking?
Are there workarounds around the problem, save from reimplementing the file parsing on my own?
Edited my input file to better reflect my actual file format. I had a bit oversimplified in my original question.
"GOLF" ANSWER:
This starts with the answer from mtrw and shortens it further:
fid = fopen('testMixed.txt','rt');
data = textscan(fid,'%s','Delimiter','\n','MultipleDelimsAsOne','1',...
'CommentStyle','%');
fclose(fid);
data = strcat(data{1},{' '});
data = sscanf([data{:}],'%i',[sum(isspace(data{1})) inf]).';
PREVIOUS ANSWER:
My first thought was to use TEXTSCAN, since it has an option that allows you to ignore certain lines as comments when they start with a given character (like %). However, TEXTSCAN doesn't appear to handle numbers in hexadecimal format well. Here's another option:
fid = fopen('testMixed.txt','r'); % Open file
% First, read all the comment lines (lines that start with '%'):
comments = {};
position = 0;
nextLine = fgetl(fid); % Read the first line
while strcmp(nextLine(1),'%')
comments = [comments; {nextLine}]; % Collect the comments
position = ftell(fid); % Get the file pointer position
nextLine = fgetl(fid); % Read the next line
end
fseek(fid,position,-1); % Rewind to beginning of last line read
% Read numerical data:
nCol = sum(isspace(nextLine))+1; % Get the number of columns
data = fscanf(fid,'%i',[nCol inf]).'; % Note '%i' works for all integer formats
fclose(fid); % Close file
This will work for an arbitrary number of comments at the beginning of the file. The computation to get the number of columns was inspired by Jacob's answer.
New:
This is the best I could come up with. It should work for any number of comment lines and columns. You'll have to do the rest yourself if there are strings, etc.
% Define the characters representing the start of the commented line
% and the delimiter
COMMENT_START = '%%';
DELIMITER = ' ';
% Open the file
fid = fopen('testMixed.txt');
% Read each line till we reach the data
l = COMMENT_START;
while(l(1)==COMMENT_START)
l = fgetl(fid);
end
% Compute the number of columns
cols = sum(l==DELIMITER)+1;
% Split the first line
split_l = regexp(l,' ','split');
% Read all the data
A = textscan(fid,'%s');
% Compute the number of rows
rows = numel(A{:})/cols;
% Close the file
fclose(fid);
% Assemble all the data into a matrix of cell strings
DATA = [split_l ; reshape(A{:},[cols rows])']; %' adding this to make it pretty in SO
% Recognize each column and process accordingly
% by analyzing each element in the first row
numeric_data = zeros(size(DATA));
for i=1:cols
str = DATA(1,i);
% If there is no '0x' present
if isempty(findstr(str{1},'0x')) == true
% This is a number
numeric_data(:,i) = str2num(char(DATA(:,i)));
else
% This is a hexadecimal number
col = char(DATA(:,i));
numeric_data(:,i) = hex2dec(col(:,3:end));
end
end
% Display the data
format short g;
disp(numeric_data)
This works for data like this:
% Comment 1
% Comment 2
1.2 0xc661 10 0xa661
2 0xd661 20 0xb661
3 0xe661 30 0xc661
Output:
1.2 50785 10 42593
2 54881 20 46689
3 58977 30 50785
OLD:
Yeah, I don't think LOAD is the way to go. You could try:
a = char(importdata('testHexa.txt'));
a = hex2dec(a(:,3:end));
This is based on both gnovice's and Jacob's answers, and is a "best of breed"
For files like:
% this is my comment
% this is my other comment
1 0xc661 123
2 0xd661 456
% surprise comment
3 0xe661 789
4 0xb661 1234567
(where the number of columns within the file MUST be the same, but not known ahead of time, and all comments denoted by a '%' character), the following code is fast and easy to read:
f = fopen('hexdata.txt', 'rt');
A = textscan(f, '%s', 'Delimiter', '\n', 'MultipleDelimsAsOne', '1', 'CollectOutput', '1', 'CommentStyle', '%');
fclose(f);
A = A{1};
data = sscanf(A{1}, '%i')';
data = repmat(data, length(A), 1);
for ctr = 2:length(A)
data(ctr,:) = sscanf(A{ctr}, '%i')';
end