Convert .csv to .out (complex numbers) - matlab

I have a csv file that has complex numbers.
This is sample of some numbers I have in the csv file:
(0.12825663763789857+0.20327998150393212j),(0.21890748607218197+0.160563964013564j),(0.28205414129281525+0.09884068776334366j),(0.030927026479380615+0.26334550583848626j)
I want to read this file and then save in (.out) file all the real parts in the first column and all the imaginary parts in the second column (without the imaginary letter j).

Here is one attempt. It is slightly more complicated due to the ( and ) that surround your numbers.
First, use textscan to read the file. Since I guess you don't know how many numers are in the file, read everything into a singe string. Will work with mutiple lines, too:
filename = 'data.csv';
fid = fopen(filename);
content = textscan(fid, '%s');
fclose(fid);
For this purpose, content now is a slightly weird cell array (look at the textscan-docs for details). Just initialize the variable nums which will store the numbers and loop through content (if you know a bit more about your csv file, you might pre-allocate nums):
nums = [];
for c1 = 1:numel(content{1})
Next, split the string at every occurence of ,:
string_list = strsplit(content{1}{c1},',');
This gives another cell array. Loop through it to convert the strings to numbers (and end the outer loop):
for c2 = 1 : numel(string_list)
nums(end+1) = str2num(string_list{c2});
end
end
Last, just store the real and the imaginary part of the numbers in separate columns:
out = [];
out(:,1) = real(nums);
out(:,2) = imag(nums);
and save it to data.out.
Update As you mentioned precision, you could use
dlmwrite('data.out', out, 'precision','%.20f');
However, here you need to understand the floating point representation in Matlab. In particular, try to understand the following:
>> a = 0.12825663763789857
a =
0.1283
>> fprintf('%.20f\n', a)
0.12825663763789857397
>> eps(a)
ans =
2.7756e-17
Note that one could have done this without cenverting the strings to numbers, but the way above would allow you to use the data in Matlab instead of just saving it.

HEre is an attempt without converting your strings to numbers, therefore one does not have to deal with precision. It works with negative real and imaginary numbers, too. + signs are removed when written to the new file, - signs are preserved:
filename = 'data.csv';
fid = fopen(filename);
content = textscan(fid, '%s');
fclose(fid);
fid = fopen('data.out','w');
pattern = '(?<real>-{0,1}\d+.\d+)(?<imag>[+-]\d+.\d+)j';
for c1 = 1:numel(content{1})
result = regexp(content{1}{c1}, pattern, 'names');
for c2 = 1:numel(result)
fprintf(fid, '%s,%s\n', strrep(result(c2).real,'+',''), strrep(result(c2).imag,'+',''));
end
end
fclose(fid);

Related

String vector to array

I am trying to make a script in Matlab that pulls data from a file and generates an array of data. Since the data is a string I've tried to split it into columns, take the transpose, and split it into columns again to populate an array.
When I run the script I don't get any errors, but I also don't get any useful data. I tell it to display the final vector (Full_Array) and I get {1×4 cell} 8 times. When I try to use strsplit I get the error:
'Error using strsplit (line 80) First input must be either a character vector or a string scalar.'
I'm pretty new to Matlab and I honestly have no clue how to fix it after reading through similar threads and the documentation I'm out of ideas. I've attached the code and the data to read in below. Thank you.
clear
File_Name = uigetfile; %Brings up windows file browser to locate .xyz file
Open_File = fopen(File_Name); %Opens the file given by File_Name
File2Vector = fscanf(Open_File,'%s'); %Prints the contents of the file to a 1xN vector
Vector2ColumnArray = strsplit(File2Vector,';'); %Splits the string vector from
%File2Vector into columns, forming an array
Transpose = transpose(Vector2ColumnArray); %Takes the transpose of Vector2ColumnArray
%making a column array into a row array
FullArray = regexp(Transpose, ',', 'split');
The data I am trying to read in comes from a .xyz file that I have titled methylformate.xyz, here is the data:
O2,-0.23799,0.65588,-0.69492;
O1,0.50665,0.83915,1.47685;
C2,-0.32101,2.08033,-0.75096;
C1,0.19676,0.17984,0.49796;
H4,0.66596,2.52843,-0.59862;
H3,-0.67826,2.36025,-1.74587;
H2,-1.03479,2.45249,-0.00927;
H1,0.23043,-0.91981,0.45346;
When I started using Matlab I also had problems with the data structure. The last line
FullArray = regexp(Transpose, ',', 'split');
splits each line and stores it in a cell array. In order to access the individual strings you have to index with curly brackets into FullArray:
FullArray{1}{1} % -> 'O2'
FullArray{1}{2} % -> '-0.23799'
FullArray{2}{1} % -> 'O1'
FullArray{2}{2} % -> '0.50665'
Thereby the first number corresponds to the row and the second to the particular element in the row.
However, there are easier functions in Matlab which load text files based on regular expressions.
Usually, the easiest function for reading mixed data is readtable.
data = readtable('methylformate.txt');
However, in your case this is more complex because
readtable can't cope with .xyz files, so you'd have to copy to .txt
The semi-colons confuse the read and make the last column characters
You can loop through each row and use textscan like so:
fid = fopen('methylformate.xyz');
tline = fgetl(fid);
myoutput = cell(0,4);
while ischar(tline)
myoutput(end+1,:) = textscan(tline, '%s %f %f %f %*[^\n]', 'delim', ',');
tline = fgetl(fid);
end
fclose(fid);
Output is a cell array of strings or doubles (as appropriate).

Matlab - string containing a number and equal sign

I have a data file that contains parameter names and values with an equal sign in between them. It's like this:
A = 1234
B = 1353.335
C =
D = 1
There is always one space before and after the equal sign. The problem is some variables don't have values assigned to them like "C" above and I need to weed them out.
I want to read the data file (text) into a cell and just remove the lines with those invalid statements or just create a new data file without them.
Whichever is easier, but I will eventually read the file into a cell with textscan command.
The values (numbers) will be treated as double precision.
Please, help.
Thank you,
Eric
Try this:
fid = fopen('file.txt'); %// open file
x = textscan(fid, '%s', 'delimiter', '\n'); %// or '\r'. Read each line into a cell
fclose(fid); %// close file
x = x{1}; %// each cell of x contains a line of the file
ind = ~cellfun(#isempty, regexp(x, '=\s[\d\.]+$')); %// desired lines: space, numbers, end
x = x(ind); %// keep only those lines
If you just want to get the variables, and reject lines that do not have any character, this might work (the data.txt is just a txt generated by the example of data you have given):
fid = fopen('data.txt');
tline = fgets(fid);
while ischar(tline)
tmp = cell2mat(regexp(tline,'\=(.*)','match'));
b=str2double(tmp(2:end));
if ~isnan(b)
disp(b)
end
tline = fgets(fid);
end
fclose(fid);
I am reading the txt file line by line, and using general expressions to get rid of useless chars, and then converting to double the value read.

Matlab Input/output of Several Files

I do matlab operation with two data file whose entries are complex numbers. For example,
fName = '1corr.txt';
f = dlmread('1EA.txt',',');
fid = fopen(fName);
tline = '';
Then I do matrix and other operations between these two files and write my output which I call 'modTrace' as:
modTrace
fileID = fopen('1out.txt','w');
v = [(0:(numel(modTrace)-1)).' real(modTrace(:)) ].';
fprintf(fileID,'%d %e\n',v);
The question is, if I have for example 100 pairs of such data files, like (2corr.txt, 2EA.txt), ....(50corr.txt, 50EA.txt) how can I generalize the input files and how to write all the output files at a time?
First of all, use sprintf to get your variable names depending on the current index.
corrName=sprintf('%dcorr.txt',idx);
EAName=sprintf('%dEA.txt',idx);
outName=sprintf('%dout.txt',idx);
This way, you have one variable (idx) which has to be changed.
Finally put everything into a loop:
n=100
for idx=1:n
corrName=sprintf('%dcorr.txt',idx);
EAName=sprintf('%dEA.txt',idx);
outName=sprintf('%dout.txt',idx);
f = dlmread(EAName,',');
fid = fopen(corrName);
tline = '';
modTrace
fileID = fopen(outName,'w');
v = [(0:(numel(modTrace)-1)).' real(modTrace(:)) ].';
fprintf(fileID,'%d %e\n',v);
end
Instead of hardcoding the number 100, you could also use n=numel(dir('*EA.txt')). It count's the files ending with EA.txt

Load multiple tab delimited text files

I have boatloads of tab delimited textfiles that contain numerical data in 1000x2 format.
They're named file00001.txt - file10000.txt
I would like to write a script to load each of these files and make a variable containing ONLY the 400th row of the 2nd column of each of these files.
After that I'm going to try and plot a graph with the data I collected - but that's not important here.
I would be very grateful for your help.
Edit -
My most recent endeavour is:
numfiles = 10;
mydata = cell(1, numfiles);
for k = 1:numfiles
myfilename = sprintf('DM0000%d.txt', k);
mydata{k} = importdata(myfilename);
end
I'm running into a few problems -
1) if numfiles is >9, the 10th file data entry in the mydata variable comes up as []. This may have something to do with the naming method of my files? They're named in this fashion:
DM00000 ...DM00009, DM00010, DM00011, etc.
2) Also this is pretty slow to load, someone said using fopen, if so where should I put it in and how?
I'm guessing it'd be somewhere along the lines of fopen('filename', 'r')?
Based on your edit, this is what I'd recommend:
numfiles = 10;
row = 400;
column = 2;
data = zeros(1, numfiles);
for k = 1:numfiles
filename = sprintf('DM%05d.txt', k);
fid = fopen(filename,'r');
tempdata = textscan(fid, '%f%f');
fclose(fid);
data(k) = tempdata{column}(row);
end
I've updated the formatspec in sprintf to create the filenames correctly (you were missing the padding with zeros). I'm using textscan to import the data as doubles (change the %f to something else if required - check out the formatspec documentation). I also changed data to be a matrix rather than a cell array. You mentioned that you'd want to plot the data, so it'll be easier if it's a matrix and I couldn't see any need to use a cell array here.

Matlab: How to read commented portion of ascii file

I have an ascii file whose first couple hundred lines are commented (followed by the data) that give some information about the data. For example these are couple of lines I snipped out from large number of lines which are commented:
Right now I am only reading the data without comments by using load as:
filename = uigetfile('*.dat', 'Select Input data');
Data = load(filename, '-ascii');
How can I read the commented lines (which end just before the data starts) and pick some comments out of all comments based on some identifications such as Program name and version, Creation date etc. ?
Use textscan to read the lines into a cell array:
fid = fopen(filename, 'r');
C = textscan(fid, '%s', 'Delimiter', '\n');
C = C{:}; %// Flatten cell array
fclose(fid);
Now you can use regexp to manipulate the textual data. For instance, to find the comment lines that contain the string "Creation date", you can do this:
idx = ~cellfun('isempty', regexp(C, "^\s*%.*Creation date"));
where "^\s*% matches the percent sign (%) at the beginning of the line along with any leading whitespace, and the .* matches any number of characters until the occurrence of "Creation date". Needless to say, you can adjust the regular expression pattern to your liking.
The resulting variable idx stores a logical (i.e boolean) vector with "1"s at the positions of the lines matching the pattern (you can obtain their explicit numerical indices with find(idx)). Next you can filter those lines with C(idx) or iterate over them with a for loop.
fid = fopen(filename);
nHeaderRows = 412;
headerCell = cell(nHeaderRows, 1);
for i=1:nHeaderRows
headerCell{i} = fgets(fid);
end
headerText = char(headerCell);