String vector to array - matlab

I am trying to make a script in Matlab that pulls data from a file and generates an array of data. Since the data is a string I've tried to split it into columns, take the transpose, and split it into columns again to populate an array.
When I run the script I don't get any errors, but I also don't get any useful data. I tell it to display the final vector (Full_Array) and I get {1×4 cell} 8 times. When I try to use strsplit I get the error:
'Error using strsplit (line 80) First input must be either a character vector or a string scalar.'
I'm pretty new to Matlab and I honestly have no clue how to fix it after reading through similar threads and the documentation I'm out of ideas. I've attached the code and the data to read in below. Thank you.
clear
File_Name = uigetfile; %Brings up windows file browser to locate .xyz file
Open_File = fopen(File_Name); %Opens the file given by File_Name
File2Vector = fscanf(Open_File,'%s'); %Prints the contents of the file to a 1xN vector
Vector2ColumnArray = strsplit(File2Vector,';'); %Splits the string vector from
%File2Vector into columns, forming an array
Transpose = transpose(Vector2ColumnArray); %Takes the transpose of Vector2ColumnArray
%making a column array into a row array
FullArray = regexp(Transpose, ',', 'split');
The data I am trying to read in comes from a .xyz file that I have titled methylformate.xyz, here is the data:
O2,-0.23799,0.65588,-0.69492;
O1,0.50665,0.83915,1.47685;
C2,-0.32101,2.08033,-0.75096;
C1,0.19676,0.17984,0.49796;
H4,0.66596,2.52843,-0.59862;
H3,-0.67826,2.36025,-1.74587;
H2,-1.03479,2.45249,-0.00927;
H1,0.23043,-0.91981,0.45346;

When I started using Matlab I also had problems with the data structure. The last line
FullArray = regexp(Transpose, ',', 'split');
splits each line and stores it in a cell array. In order to access the individual strings you have to index with curly brackets into FullArray:
FullArray{1}{1} % -> 'O2'
FullArray{1}{2} % -> '-0.23799'
FullArray{2}{1} % -> 'O1'
FullArray{2}{2} % -> '0.50665'
Thereby the first number corresponds to the row and the second to the particular element in the row.
However, there are easier functions in Matlab which load text files based on regular expressions.

Usually, the easiest function for reading mixed data is readtable.
data = readtable('methylformate.txt');
However, in your case this is more complex because
readtable can't cope with .xyz files, so you'd have to copy to .txt
The semi-colons confuse the read and make the last column characters
You can loop through each row and use textscan like so:
fid = fopen('methylformate.xyz');
tline = fgetl(fid);
myoutput = cell(0,4);
while ischar(tline)
myoutput(end+1,:) = textscan(tline, '%s %f %f %f %*[^\n]', 'delim', ',');
tline = fgetl(fid);
end
fclose(fid);
Output is a cell array of strings or doubles (as appropriate).

Related

Convert .csv to .out (complex numbers)

I have a csv file that has complex numbers.
This is sample of some numbers I have in the csv file:
(0.12825663763789857+0.20327998150393212j),(0.21890748607218197+0.160563964013564j),(0.28205414129281525+0.09884068776334366j),(0.030927026479380615+0.26334550583848626j)
I want to read this file and then save in (.out) file all the real parts in the first column and all the imaginary parts in the second column (without the imaginary letter j).
Here is one attempt. It is slightly more complicated due to the ( and ) that surround your numbers.
First, use textscan to read the file. Since I guess you don't know how many numers are in the file, read everything into a singe string. Will work with mutiple lines, too:
filename = 'data.csv';
fid = fopen(filename);
content = textscan(fid, '%s');
fclose(fid);
For this purpose, content now is a slightly weird cell array (look at the textscan-docs for details). Just initialize the variable nums which will store the numbers and loop through content (if you know a bit more about your csv file, you might pre-allocate nums):
nums = [];
for c1 = 1:numel(content{1})
Next, split the string at every occurence of ,:
string_list = strsplit(content{1}{c1},',');
This gives another cell array. Loop through it to convert the strings to numbers (and end the outer loop):
for c2 = 1 : numel(string_list)
nums(end+1) = str2num(string_list{c2});
end
end
Last, just store the real and the imaginary part of the numbers in separate columns:
out = [];
out(:,1) = real(nums);
out(:,2) = imag(nums);
and save it to data.out.
Update As you mentioned precision, you could use
dlmwrite('data.out', out, 'precision','%.20f');
However, here you need to understand the floating point representation in Matlab. In particular, try to understand the following:
>> a = 0.12825663763789857
a =
0.1283
>> fprintf('%.20f\n', a)
0.12825663763789857397
>> eps(a)
ans =
2.7756e-17
Note that one could have done this without cenverting the strings to numbers, but the way above would allow you to use the data in Matlab instead of just saving it.
HEre is an attempt without converting your strings to numbers, therefore one does not have to deal with precision. It works with negative real and imaginary numbers, too. + signs are removed when written to the new file, - signs are preserved:
filename = 'data.csv';
fid = fopen(filename);
content = textscan(fid, '%s');
fclose(fid);
fid = fopen('data.out','w');
pattern = '(?<real>-{0,1}\d+.\d+)(?<imag>[+-]\d+.\d+)j';
for c1 = 1:numel(content{1})
result = regexp(content{1}{c1}, pattern, 'names');
for c2 = 1:numel(result)
fprintf(fid, '%s,%s\n', strrep(result(c2).real,'+',''), strrep(result(c2).imag,'+',''));
end
end
fclose(fid);

Trouble loading in mixed data from txt file MATLAB

I'm trying to load in data from a text file. The first two rows are headers, following the headers the first two columns are date and time. The rest of the columns are floats.
data should have 11 columns, however, whos returns that size is only 1x3
Data txt file:
fid = fopen('allunderway.txt', 'rt');
data = textscan(fid, '%{M/dd/yyyy}D %{HH:mm:ss}D %4.2f %2.4f %2.5f %2.4f %2.4f %2.2f %4.2f %3.1f %1.4f', 'HeaderLines', 2, 'CollectOutput', true);
fclose(fid);
whos data
date = data{1};
time = data{2};
wnd_td = data{10};
wnd_ts = data{11};
You could try using a delimiter instead, seems like this is a tab separated file.
you might have to try both 'rt' and 'r' in the fopen command.
As for the textscan part try adding this
'Delimiter','\t','EmptyValue',NaN
It adds tabs as a delimiter and replaces empty values with NaN.
or uses spaces as delimiters and set it so that it doesn't matter if there's 1 or multiple spaces
'Delimiter',' ','MultipleDelimsAsOne',1
Or use 'Whitespace' as the delimiter (uses both tabs and spaces).

Read a text file and separate the data into different columns and different tables in MATLAB

I have a huge text file that needs to be read and processed in MATLAB. This file at some points contain text to indicate that a new data series has started.
I have searched here but cant find any simple solution.
So what I want to do is to read the data in the file, put the data in a table in three different columns and when it finds text it should create a new table. It should repeat this process until the entire document is scanned.
This is how the document looks like:
time V(A,B) I(R1)
Step Information: X=1 (Run: 1/11)
0.000000000000000e+000 -2.680148e-016 0.000000e+00
9.843925313007988e-012 -4.753470e-006 2.216314e-011
1.000052605772457e-011 -4.835427e-006 2.552497e-011
1.031372754715773e-011 -4.999340e-006 -3.042096e-012
1.094013052602406e-011 -5.327165e-006 -1.206968e-011
Step Information: X=1 (Run: 2/11)
0.000000000000000e+000 -2.680148e-016 0.000000e+000
9.843925313007988e-012 -4.753470e-006 2.216314e-011
1.000052605772457e-011 -4.835427e-006 2.552497e-011
1.031372754715773e-011 -4.999340e-006 -3.042096e-012
1.094013052602406e-011 -5.327165e-006 -1.206968e-011
A rather crude approach is to read the file line by line and check if the line consists of three numbers. If it does, then append this to a temporary matrix. When you finally get to a line that doesn't contain three numbers, append this matrix as an element in a cell array, clear the temporary matrix and continue.
Something like this would work, assuming that the file is stored in 'file.txt':
%// Open the file
f = fopen('file.txt', 'r');
%// Initialize empty cell array
data = {};
%// Initialize temporary matrix
temp = [];
%// Loop over the file...
while true
%// Get a line from the file
line = fgetl(f);
%// If we reach the end of the file, get out
if line == -1
%// Last check before we break
%// Check if the temporary matrix isn't empty and add
if ~isempty(temp)
data = [data; temp];
end
break;
end
%// Else, check to see if this line contains three numbers
numbers = textscan(line, '%f %f %f');
%// If this line doesn't consist of three numbers...
if all(cellfun(#isempty, numbers))
%// If the temporary matrix is empty, skip
if isempty(temp)
continue;
end
%// Concatenate to cell array
data = [data; temp];
%// Reset temporary matrix
temp = [];
%// If this does, then create a row vector and concatenate
else
temp = [temp; numbers{:}];
end
end
%// Close the file
fclose(f);
The code is pretty self-explanatory but let's go into it to be sure you know what's going on. First open up the file with fopen to get a "pointer" to the file, then initialize our cell array that will contain our matrices as well as the temporary matrix used when reading in matrices in between header information. After we simply loop over each line of the file and we can grab a line with fgetl using the file pointer we created. We then check to see if we have reached the end of the file and if we have, let's check to see if the temporary matrix has any numerical data in it. If it does, add this into our cell array then finally get out of the loop. We use fclose to close up the file and clean things up.
Now the heart of the operation is what follows after this check. We use textscan and search for three numbers separated by spaces. That's done with the '%f %f %f' format specifier. This should give you a cell array of three elements if you are successful with numbers. If this is correct, then convert this cell array of elements into a row of numbers and concatenate this into the temporary matrix. Doing temp = [temp; numbers{:}]; facilitates this concatenation. Simply put I piece together each number and concatenate them horizontally to create a single row of numbers. I then take this row and concatenate this as another row in the temporary matrix.
Should we finally get to a line where it's all text, this will give you all three elements in the cell array found by textscan to be empty. That's the purpose of the all and cellfun call. We search each element in the cell and see if it's empty. If every element is empty, this is a line that is text. If this situation arises, simply take the temporary matrix and add this as a new entry into your cell array. You'd then reset the temporary matrix and start the logic over again.
However, we also have to take into account that there may be multiple lines that consist of text. That's what the additional if statement is for inside the first if block using all. If we have an additional line of text that precedes a previous line of text, the temporary matrix of values should still be empty and so you should check to see if that is empty before you try and concatenate the temporary matrix. If it's empty, don't bother and just continue.
After running this code, I get the following for my data matrix:
>> format long g
>> celldisp(data)
data{1} =
0 -2.680148e-16 0
9.84392531300799e-12 -4.75347e-06 2.216314e-11
1.00005260577246e-11 -4.835427e-06 2.552497e-11
1.03137275471577e-11 -4.99934e-06 -3.042096e-12
1.09401305260241e-11 -5.327165e-06 -1.206968e-11
data{2} =
0 -2.680148e-16 0
9.84392531300799e-12 -4.75347e-06 2.216314e-11
1.00005260577246e-11 -4.835427e-06 2.552497e-11
1.03137275471577e-11 -4.99934e-06 -3.042096e-12
1.09401305260241e-11 -5.327165e-06 -1.206968e-11
To access a particular "table", do data{ii} where ii is the table you want to access that was read in from top to bottom in your text file.
The most versatile way is to read line by line using textscan. If you want to speed this process up, you can have a dummy read first:
ie. You loop through all the lines without storing the data and decide which lines are the text lines and which are numbers, recording a quick number of lines for each.
You then have enough information about the data to run through quickly the arrays. This will speed up the time it takes to store the data in your new arrays massively.
Your second loop is the one that actually reads the data into the array/s. You should now know which lines to skip. You can also pre-allocate the arrays within the data cell if you wish to.
fid = fopen('file.txt','r');
data = {};
nlines = [];
% now start the loop
k=0; % counter for data sets
while ~feof(fid)
line = fgetl(fid);
% check if is data or text
if all(ismember(line,' 0123456789+.')) % is it data
nlines(k) = nlines(k)+1;
else %is it text
k=k+1;
nlines(k) = 0;
end
end
frewind(fid); % go back to start of file
% You could preallocate the data array here if you wished
% now get the data
for aa = 1 : length(nlines)
if nlines(aa)==0;
continue
end
textscan(fid,'%s\r\n',1); % skip textline
data{aa} = textscan(fid,'%f%f%f\r\n',nlines(k));
end

Matlab: How to read commented portion of ascii file

I have an ascii file whose first couple hundred lines are commented (followed by the data) that give some information about the data. For example these are couple of lines I snipped out from large number of lines which are commented:
Right now I am only reading the data without comments by using load as:
filename = uigetfile('*.dat', 'Select Input data');
Data = load(filename, '-ascii');
How can I read the commented lines (which end just before the data starts) and pick some comments out of all comments based on some identifications such as Program name and version, Creation date etc. ?
Use textscan to read the lines into a cell array:
fid = fopen(filename, 'r');
C = textscan(fid, '%s', 'Delimiter', '\n');
C = C{:}; %// Flatten cell array
fclose(fid);
Now you can use regexp to manipulate the textual data. For instance, to find the comment lines that contain the string "Creation date", you can do this:
idx = ~cellfun('isempty', regexp(C, "^\s*%.*Creation date"));
where "^\s*% matches the percent sign (%) at the beginning of the line along with any leading whitespace, and the .* matches any number of characters until the occurrence of "Creation date". Needless to say, you can adjust the regular expression pattern to your liking.
The resulting variable idx stores a logical (i.e boolean) vector with "1"s at the positions of the lines matching the pattern (you can obtain their explicit numerical indices with find(idx)). Next you can filter those lines with C(idx) or iterate over them with a for loop.
fid = fopen(filename);
nHeaderRows = 412;
headerCell = cell(nHeaderRows, 1);
for i=1:nHeaderRows
headerCell{i} = fgets(fid);
end
headerText = char(headerCell);

Reading CSV with mixed type data

I need to read the following csv file in MATLAB:
2009-04-29 01:01:42.000;16271.1;16271.1
2009-04-29 02:01:42.000;2.5;16273.6
2009-04-29 03:01:42.000;2.599609;16276.2
2009-04-29 04:01:42.000;2.5;16278.7
...
I'd like to have three columns:
timestamp;value1;value2
I tried the approaches described here:
Reading date and time from CSV file in MATLAB
modified as:
filename = 'prova.csv';
fid = fopen(filename, 'rt');
a = textscan(fid, '%s %f %f', ...
'Delimiter',';', 'CollectOutput',1);
fclose(fid);
But it returs a 1x2 cell, whose first element is a{1}='ÿþ2', the other are empty.
I had also tried to adapt to my case the answers to these questions:
importing data with time in MATLAB
Read data files with specific format in matlab and convert date to matal serial time
but I didn't succeed.
How can I import that csv file?
EDIT After the answer of #macduff i try to copy-paste in a new file the data reported above and use:
a = textscan(fid, '%s %f %f','Delimiter',';');
and it works.
Unfortunately that didn't solve the problem because I have to process csv files generated automatically, which seems to be the cause of the strange MATLAB behavior.
What about trying:
a = textscan(fid, '%s %f %f','Delimiter',';');
For me I get:
a =
{4x1 cell} [4x1 double] [4x1 double]
So each element of a corresponds to a column in your csv file. Is this what you need?
Thanks!
Seems you're going about it the right way. The example you provide poses no problems here, I get the output you desire. What's in the 1x2 cell?
If I were you I'd try again with a smaller subset of the file, say 10 lines, and see if the output changes. If yes, then try 100 lines, etc., until you find where the 4x1 cell + 4x2 array breaks down into the 1x2 cell. It might be that there's an empty line or a single empty field or whatever, which forces textscan to collect data in an additional level of cells.
Note that 'CollectOutput',1 will collect the last two columns into a single array, so you'll end up with 1 cell array of 4x1 containing strings, and 1 array of 4x2 containing doubles. Is that indeed what you want? Otherwise, see #macduff's post.
I've had to parse large files like this, and I found I didn't like textscan for this job. I just use a basic while loop to parse the file, and I use datevec to extract the timestamp components into a 6-element time vector.
%% Optional: initialize for speed if you have large files
n = 1000 %% <# of rows in file - if known>
timestamp = zeros(n,6);
value1 = zeros(n,1);
value2 = zeros(n,1);
fid = fopen(fname, 'rt');
if fid < 0
error('Error opening file %s\n', fname); % exit point
end
cntr = 0
while true
tline = fgetl(fid); %% get one line
if ~ischar(tline), break; end; % break out of loop at end of file
cntr = cntr + 1;
splitLine = strsplit(tline, ';'); %% split the line on ; delimiters
timestamp(cntr,:) = datevec(splitLine{1}, 'yyyy-mm-dd HH:MM:SS.FFF'); %% using datevec to parse time gives you a standard timestamp vector
value1(cntr) = splitLine{2};
value2(cntr) = splitLine{3};
end
%% Concatenate at the end if you like
result = [timestamp value1 value2];