Matlab: Import multiple numeric csv .txt files into single cell array - matlab

I have multiple (say N of them) .txt files consisting of numeric csv data in matrix form. I would like to import each of these data files into one (1 x N) cell array, whilst preserving the original matrix form. If the original data is small, say 3x3, then textscan does the job in the following manner:
fileId = fopen('data1.txt');
A{1} = textscan(fileID, '%d %d %d', 'delimiter',',','CollectOutput',1);
(This will be part of a function.) But what if my .txt files have 100 columns of data? I could write '%d' 100 times in the formatSpec, but there must be a better way?
This seems to be an easy problem, but I'm quite new to Matlab and am at a loss as to how to proceed. Any advice would be much appreciated, thanks!!

For such cases with consistent data in each of those text files, you can use importdata without worrying about format specifiers. Two approaches are discussed here based on it.
Approach 1
filenames = {'data1.txt' 'data2.txt' 'data3.txt'}; %// cell array of filenames
A = cell(1,numel(filenames)); %// Pre-allocation
for k = 1:numel(filenames)
imported_data = importdata(char(filenames(k)));
formatted_data = cellfun(#str2num, imported_data, 'uni', 0);
A{k} = vertcat(formatted_data{:})
end
Approach 2
Assuming those text files are the only .txt files in the current working directory, you can directly get the filenames and use them to store data from them into a cell array, like this -
files = dir('*.txt');
A = cell(1,numel(files)); %// Pre-allocation
for k = 1:numel(files)
imported_data = importdata(files(k).name);
formatted_data = cellfun(#str2num, imported_data, 'uni', 0)
A{k} = vertcat(formatted_data{:})
end

Related

Convert .csv to .out (complex numbers)

I have a csv file that has complex numbers.
This is sample of some numbers I have in the csv file:
(0.12825663763789857+0.20327998150393212j),(0.21890748607218197+0.160563964013564j),(0.28205414129281525+0.09884068776334366j),(0.030927026479380615+0.26334550583848626j)
I want to read this file and then save in (.out) file all the real parts in the first column and all the imaginary parts in the second column (without the imaginary letter j).
Here is one attempt. It is slightly more complicated due to the ( and ) that surround your numbers.
First, use textscan to read the file. Since I guess you don't know how many numers are in the file, read everything into a singe string. Will work with mutiple lines, too:
filename = 'data.csv';
fid = fopen(filename);
content = textscan(fid, '%s');
fclose(fid);
For this purpose, content now is a slightly weird cell array (look at the textscan-docs for details). Just initialize the variable nums which will store the numbers and loop through content (if you know a bit more about your csv file, you might pre-allocate nums):
nums = [];
for c1 = 1:numel(content{1})
Next, split the string at every occurence of ,:
string_list = strsplit(content{1}{c1},',');
This gives another cell array. Loop through it to convert the strings to numbers (and end the outer loop):
for c2 = 1 : numel(string_list)
nums(end+1) = str2num(string_list{c2});
end
end
Last, just store the real and the imaginary part of the numbers in separate columns:
out = [];
out(:,1) = real(nums);
out(:,2) = imag(nums);
and save it to data.out.
Update As you mentioned precision, you could use
dlmwrite('data.out', out, 'precision','%.20f');
However, here you need to understand the floating point representation in Matlab. In particular, try to understand the following:
>> a = 0.12825663763789857
a =
0.1283
>> fprintf('%.20f\n', a)
0.12825663763789857397
>> eps(a)
ans =
2.7756e-17
Note that one could have done this without cenverting the strings to numbers, but the way above would allow you to use the data in Matlab instead of just saving it.
HEre is an attempt without converting your strings to numbers, therefore one does not have to deal with precision. It works with negative real and imaginary numbers, too. + signs are removed when written to the new file, - signs are preserved:
filename = 'data.csv';
fid = fopen(filename);
content = textscan(fid, '%s');
fclose(fid);
fid = fopen('data.out','w');
pattern = '(?<real>-{0,1}\d+.\d+)(?<imag>[+-]\d+.\d+)j';
for c1 = 1:numel(content{1})
result = regexp(content{1}{c1}, pattern, 'names');
for c2 = 1:numel(result)
fprintf(fid, '%s,%s\n', strrep(result(c2).real,'+',''), strrep(result(c2).imag,'+',''));
end
end
fclose(fid);

String vector to array

I am trying to make a script in Matlab that pulls data from a file and generates an array of data. Since the data is a string I've tried to split it into columns, take the transpose, and split it into columns again to populate an array.
When I run the script I don't get any errors, but I also don't get any useful data. I tell it to display the final vector (Full_Array) and I get {1×4 cell} 8 times. When I try to use strsplit I get the error:
'Error using strsplit (line 80) First input must be either a character vector or a string scalar.'
I'm pretty new to Matlab and I honestly have no clue how to fix it after reading through similar threads and the documentation I'm out of ideas. I've attached the code and the data to read in below. Thank you.
clear
File_Name = uigetfile; %Brings up windows file browser to locate .xyz file
Open_File = fopen(File_Name); %Opens the file given by File_Name
File2Vector = fscanf(Open_File,'%s'); %Prints the contents of the file to a 1xN vector
Vector2ColumnArray = strsplit(File2Vector,';'); %Splits the string vector from
%File2Vector into columns, forming an array
Transpose = transpose(Vector2ColumnArray); %Takes the transpose of Vector2ColumnArray
%making a column array into a row array
FullArray = regexp(Transpose, ',', 'split');
The data I am trying to read in comes from a .xyz file that I have titled methylformate.xyz, here is the data:
O2,-0.23799,0.65588,-0.69492;
O1,0.50665,0.83915,1.47685;
C2,-0.32101,2.08033,-0.75096;
C1,0.19676,0.17984,0.49796;
H4,0.66596,2.52843,-0.59862;
H3,-0.67826,2.36025,-1.74587;
H2,-1.03479,2.45249,-0.00927;
H1,0.23043,-0.91981,0.45346;
When I started using Matlab I also had problems with the data structure. The last line
FullArray = regexp(Transpose, ',', 'split');
splits each line and stores it in a cell array. In order to access the individual strings you have to index with curly brackets into FullArray:
FullArray{1}{1} % -> 'O2'
FullArray{1}{2} % -> '-0.23799'
FullArray{2}{1} % -> 'O1'
FullArray{2}{2} % -> '0.50665'
Thereby the first number corresponds to the row and the second to the particular element in the row.
However, there are easier functions in Matlab which load text files based on regular expressions.
Usually, the easiest function for reading mixed data is readtable.
data = readtable('methylformate.txt');
However, in your case this is more complex because
readtable can't cope with .xyz files, so you'd have to copy to .txt
The semi-colons confuse the read and make the last column characters
You can loop through each row and use textscan like so:
fid = fopen('methylformate.xyz');
tline = fgetl(fid);
myoutput = cell(0,4);
while ischar(tline)
myoutput(end+1,:) = textscan(tline, '%s %f %f %f %*[^\n]', 'delim', ',');
tline = fgetl(fid);
end
fclose(fid);
Output is a cell array of strings or doubles (as appropriate).

Save a sparse array in csv

I have a huge sparse matrix a and I want to save it in a .csv. I can not call full(a) because I do not have enough ram memory. So, calling dlmwrite with full(a) argument is not possible. We must note that dlmwrite is not working with sparse formatted matrices.
The .csv format is depicted below. Note that the first row and column with the characters should be included in the .csv file. The semicolon in the (0,0) position of the .csv file is necessary too.
;A;B;C;D;E
A;0;1.5;0;1;0
B;2;0;0;0;0
C;0;0;1;0;0
D;0;2.1;0;1;0
E;0;0;0;0;0
Could you please help me to tackle this problem and finally save the sparse matrix in the desired form?
You can use csvwrite function:
csvwrite('matrix.csv',a)
You could do this iteratively, as follows:
A = sprand(20,30000,.1);
delimiter = ';';
filename = 'filecontaininghugematrix.csv';
dims = size(A);
N = max(dims);
% create names first
idx = 1:26;
alphabet = dec2base(9+idx,36);
n = ceil(log(N)/log(26));
q = 26.^(1:n);
names = cell(sum(q),1);
p = 0;
for ii = 1:n
temp = repmat({idx},ii,1);
names(p+(1:q(ii))) = num2cell(alphabet(fliplr(combvec(temp{:})')),2);
p = p + q(ii);
end
names(N+1:end) = [];
% formats for writing
headStr = repmat(['%s' delimiter],1,dims(2));
headStr = [delimiter headStr(1:end-1) '\n'];
lineStr = repmat(['%f' delimiter],1,dims(2));
lineStr = ['%s' delimiter lineStr(1:end-1) '\n'];
fid = fopen(filename,'w');
% write header
header = names(1:dims(2));
fprintf(fid,headStr,header{:});
% write matrix rows
for ii = 1:dims(1)
row = full(A(ii,:));
fprintf(fid, lineStr, names{ii}, row);
end
fclose(fid);
The names cell array is quite memory demanding for this example. I have no time to fix that now, so think about this part yourself if it is really a problem ;) Hint: just write the header element wise, first A;, then B; and so on. For the rows, you can create a function that maps the index ii to the desired character, in which case the complete first part is not necessary.

Load multiple tab delimited text files

I have boatloads of tab delimited textfiles that contain numerical data in 1000x2 format.
They're named file00001.txt - file10000.txt
I would like to write a script to load each of these files and make a variable containing ONLY the 400th row of the 2nd column of each of these files.
After that I'm going to try and plot a graph with the data I collected - but that's not important here.
I would be very grateful for your help.
Edit -
My most recent endeavour is:
numfiles = 10;
mydata = cell(1, numfiles);
for k = 1:numfiles
myfilename = sprintf('DM0000%d.txt', k);
mydata{k} = importdata(myfilename);
end
I'm running into a few problems -
1) if numfiles is >9, the 10th file data entry in the mydata variable comes up as []. This may have something to do with the naming method of my files? They're named in this fashion:
DM00000 ...DM00009, DM00010, DM00011, etc.
2) Also this is pretty slow to load, someone said using fopen, if so where should I put it in and how?
I'm guessing it'd be somewhere along the lines of fopen('filename', 'r')?
Based on your edit, this is what I'd recommend:
numfiles = 10;
row = 400;
column = 2;
data = zeros(1, numfiles);
for k = 1:numfiles
filename = sprintf('DM%05d.txt', k);
fid = fopen(filename,'r');
tempdata = textscan(fid, '%f%f');
fclose(fid);
data(k) = tempdata{column}(row);
end
I've updated the formatspec in sprintf to create the filenames correctly (you were missing the padding with zeros). I'm using textscan to import the data as doubles (change the %f to something else if required - check out the formatspec documentation). I also changed data to be a matrix rather than a cell array. You mentioned that you'd want to plot the data, so it'll be easier if it's a matrix and I couldn't see any need to use a cell array here.

Reading CSV with mixed type data

I need to read the following csv file in MATLAB:
2009-04-29 01:01:42.000;16271.1;16271.1
2009-04-29 02:01:42.000;2.5;16273.6
2009-04-29 03:01:42.000;2.599609;16276.2
2009-04-29 04:01:42.000;2.5;16278.7
...
I'd like to have three columns:
timestamp;value1;value2
I tried the approaches described here:
Reading date and time from CSV file in MATLAB
modified as:
filename = 'prova.csv';
fid = fopen(filename, 'rt');
a = textscan(fid, '%s %f %f', ...
'Delimiter',';', 'CollectOutput',1);
fclose(fid);
But it returs a 1x2 cell, whose first element is a{1}='ÿþ2', the other are empty.
I had also tried to adapt to my case the answers to these questions:
importing data with time in MATLAB
Read data files with specific format in matlab and convert date to matal serial time
but I didn't succeed.
How can I import that csv file?
EDIT After the answer of #macduff i try to copy-paste in a new file the data reported above and use:
a = textscan(fid, '%s %f %f','Delimiter',';');
and it works.
Unfortunately that didn't solve the problem because I have to process csv files generated automatically, which seems to be the cause of the strange MATLAB behavior.
What about trying:
a = textscan(fid, '%s %f %f','Delimiter',';');
For me I get:
a =
{4x1 cell} [4x1 double] [4x1 double]
So each element of a corresponds to a column in your csv file. Is this what you need?
Thanks!
Seems you're going about it the right way. The example you provide poses no problems here, I get the output you desire. What's in the 1x2 cell?
If I were you I'd try again with a smaller subset of the file, say 10 lines, and see if the output changes. If yes, then try 100 lines, etc., until you find where the 4x1 cell + 4x2 array breaks down into the 1x2 cell. It might be that there's an empty line or a single empty field or whatever, which forces textscan to collect data in an additional level of cells.
Note that 'CollectOutput',1 will collect the last two columns into a single array, so you'll end up with 1 cell array of 4x1 containing strings, and 1 array of 4x2 containing doubles. Is that indeed what you want? Otherwise, see #macduff's post.
I've had to parse large files like this, and I found I didn't like textscan for this job. I just use a basic while loop to parse the file, and I use datevec to extract the timestamp components into a 6-element time vector.
%% Optional: initialize for speed if you have large files
n = 1000 %% <# of rows in file - if known>
timestamp = zeros(n,6);
value1 = zeros(n,1);
value2 = zeros(n,1);
fid = fopen(fname, 'rt');
if fid < 0
error('Error opening file %s\n', fname); % exit point
end
cntr = 0
while true
tline = fgetl(fid); %% get one line
if ~ischar(tline), break; end; % break out of loop at end of file
cntr = cntr + 1;
splitLine = strsplit(tline, ';'); %% split the line on ; delimiters
timestamp(cntr,:) = datevec(splitLine{1}, 'yyyy-mm-dd HH:MM:SS.FFF'); %% using datevec to parse time gives you a standard timestamp vector
value1(cntr) = splitLine{2};
value2(cntr) = splitLine{3};
end
%% Concatenate at the end if you like
result = [timestamp value1 value2];