I have consecutive .dat files which I want to read and input into a single matrix by concatenating the files vertically. The code I have so far works fine for simple numeric files with only tabs as delimiter.
import=[];
data=[];
for i = 1:32
data1=[import dlmread(sprintf('%d.dat',i))];
data=vertcat(data, data1);
clear data1;
end
and I take the correct output into the data matrix. But my file format is as follows:
first second third
0 11/15 08:57:42.000 54 67 82
1 11/15 09:48:47.010 49 32 31
...
As you can see I have three delimiters (: \t /) and headers only in the last three columns which are essentially the ones I want to read, that is I want a matrix:
54 67 82
49 32 31
...
I tried specifying the delimiters into the dlmwrite and how many rows/columns to skip but an error occurs in sprintf ('delimiter = sprintf(delimiter); % Interpret \t (if necessary)'). Does anyone have any idea how to go about this?
UPDATE:
I managed to get a little further
data=[];
for i = 1:32
filename = sprintf( '%d.dat',i );
data1=importdata(filename);%creates a cell array
data2=cell2mat(data1(3:end,:));%converts it to char
%The data, without the header, start from the 3rd row.
data=vertcat(data, data2); %concatenate vertically all the files
clear data1; clear data2;
end
%the data
a1=str2num(data(1:end,20:25));%the first data column is in char 20-25
a2=str2num(data(1:end,30:35));%the second data column is in char 30-35
The thing is that the last part takes too much time, over an hour has passed until I manually stopped it. Does anyone know a simpler and faster way to do this?
I managed to solve this myself so I post it here for future reference:
for i = 1:32
filename = sprintf( '%d.dat',i );
data1 = dlmread(filename,'',2,3);%start from row 2, headercolumn 3
data=vertcat(data, data1);
clear data1;
end
Now the data matrix contains only my data columns and it runs in a few seconds.
Related
I have text file that looks like the following for example
1 12 34
67 56 78 98 98
...
Basically row of numbers, but the row size isn't fixed.
Is there a quick way to read this in matlab, and maybe store the content in a cell array?
importdata would read your file and fill up the empty spaces with NaNs, that could be used.
Or you parse your file directly:
str = fileread('file.txt'); %read your file into a string
data = cellfun(#(line) cellfun(#str2double, strsplit(line, ' '), 'UniformOutput', false), strsplit(str, '\r\n'), 'UniformOutput', false);
strsplit splits at the return statement -> first cell array
Use those cell-arrays as input and split them per space.
data = cellfun(#(line) str2double(strsplit(line, ' ')), strsplit(str, '\r\n'), 'UniformOutput', false);
You could also use the str2double to convert the cell array to numbers directly, but then it'll be a normal array, which means that the access would be data{1}(2) instead of data{1}{2} and I would prefer to keep the styles aligned.
try textscan
str = fopen('test.txt');
data = textscan(str,'%s','Delimiter','\n');
%data = textscan(str,'%f');
fclose(str);
I have a text file with 1 million decimal digits of "e" number with 80 digits on each line excluding the first and the last line which have 76 and 4 digits and the file has 12501 lines. I want to convert it into a vector in matlab with each digit on each row. I tried num2str function, but the problem is that it gets converted like for example '7.1828e79' (13 characters). What can I do?
P.S.1: The first two lines of the text file (76 and 80 digits) are:
7182818284590452353602874713526624977572470936999595749669676277240766303535 47594571382178525166427427466391932003059921817413596629043572900334295260595630
P.S.2: I used "dlmread" and got a 12501x1 vector, with the first and second row of 7.18281828459045e+75 and 4.75945713821785e+79 and the problem is that when I use num2str for example for the first row value, I get: '7.182818284590453e+75' as a string and not the whole 76 digits. My aim was to do something like this:
e1=dlmread('e.txt');
es1=num2str(e1);
for i=1:12501
for j=1:length(es1(1,:))
a1((i-1)*length(es1(1,:))+j)=es1(i,j);
end
end
e_digits=a1.';
but I get a string like this:
a1='7.182818284590453e+754.759457138217852e+797.381323286279435e+799.244761460668082e+796.133138458300076e+791.416928368190255e+79 5...'
with 262521 characters instead of 1 million digits.
P.S.3: I think the problem might be solved if I can manipulate the text file in a way that I have one digit on each line and simply use dlmread.
Well, this is not hard, there are many ways to do it.
So first you want to load in your file as a Char Array using something simple like (you want a Char Array so that you can easily manipulate it to forget about the lines breaks) :
C = fileread('yourfile.txt'); %loads file as Char Array
D = C(~isspace(C)); %Removes SPACES which are line-breaks
Next, you want to actually append a SPACE between each char (this is because you want to use the num2str transform - and matlab needs to see the space), you can do this using a RESHAPE, a STRTRIM or simply a REGEX:
E = strtrim(regexprep(D, '.{1}', '$0 ')); %Add a SPACE after each Numeric Digit
Now you can transform it using str2num into a Vector:
str2num(E)'; %Converts the Char Array back to Vector
which will give you a single digit each row.
Using your example, I get a vector of 156 x 1, with 1 digit each row as you required.
You can get a digit per row like this
fid = fopen('e.txt','r');
c = textscan(fid,'%s');
c=cat(1,c{:});
c = cellfun(#(x) str2num(reshape(x,[],1)),c,'un',0);
c=cat(1,c{:});
And it is not the only possible way.
Could you please tell what is the final task, how do you plan using the array of e digits?
Ok, coming from Python and never having used MATLAB before, it seems like it is unnecessarily hard to write data to a csv using MATLAB...
So my data looks like this:
col1 A2A B2 CC3 D5
asd189 123 33 71119 18291
as33d 1311 31 NaN 1011
asd189 NaN 44 79 191
It has N header columns that are made of alphanumeric strings.
It has a leftmost column of length M which is made of alphanumeric strings.
It has an (M-1) x (N-1) array of NUMERIC data, with possible NaNs.
Can you please provide code to write this to a csv? I cannot use the xlswrite function because I'm on a cluster without Excel installed. Really just want to get on with the actual data analysis. Thanks
You can only write matrices (not cell arrays) directly using csvwrite, and as you say you need Excel installed for xlswrite, so that leaves you with low level operations. You can see a walkthrough for writing to text files here, and code for your example below:
% Initialise example cell array
M = {'col1', 'A2A', 'B2', 'CC3', 'D5'
'asd189', 123, 33, 71119, 18291
'as33d', 1311, 31, NaN, 1011
'asd189', NaN, 44, 79, 191};
% Open a file for writing to (doesn't have to already exist, can specify full directory)
fID = fopen('test.csv','w');
% Write header line, formatted as strings with comma delimiter. Note \r\n for new line
fprintf(fID, [repmat('%s, ', 1, size(M,2)-1),'%s\r\n'], M{1,:});
% Loop through other rows
for row = 2:size(M,1)
% Write each line of cell array, with first column formatted as string
% and other columns formatted as floats
fprintf(fID, ['%s, ', repmat('%f, ', 1, size(M,2)-2),'%f\r\n'], M{row,:});
end
% Close file after writing
fclose(fID);
Result:
Use writetable. It makes writing to CSV (or to an Excel file, or to other text-delimited file formats) much easier than using csvwrite, or xlswrite, or low-level commands such as fprintf.
>> t = table({'asd189';'as33d';'asd189'},[123;1311;NaN],[33;31;44],[71119;NaN;79],[18291;1011;191]);
>> t.Properties.VariableNames = {'col1','A2A','B2','CC3','D5'}
t =
col1 A2A B2 CC3 D5
________ ____ __ _____ _____
'asd189' 123 33 71119 18291
'as33d' 1311 31 NaN 1011
'asd189' NaN 44 79 191
>> writetable(t,'myfile.csv')
If your data is currently not stored as a table (maybe it's in an array or cell array), it's pretty easy to convert to a table using utility functions such as array2table or cell2table. You will only pay a small time penalty for doing this.
PS - you don't need Excel to be installed in order to write to an Excel file. You may not be able to read them afterwards, but MATLAB can still write them. But it sounds like you'd prefer .csv anyway.
I have a very lare csv file containing three columns. Now I want to load these columns as fast as possible into a matlab matrix.
Currently what I do is this
fid = fopen(inputfile, 'rt');
g = textscan(fid,'%s','delimiter','\r\n');
tdata = g{1};
fclose(fid);
results = zeros([numel(tdata)-4], 3);
tic
display('start reading data...');
for r = 4:numel(tdata)
if ~mod(r, 100)
display(['data row: ' num2str(r) ' / ' num2str(numel(tdata))]);
end
entries = strsplit(tdata{r}, ',');
results(r-3,1) = str2double(strrep(entries{1},',', '.'));
results(r-3,2) = str2double(strrep(entries{2},',', '.'));
results(r-3,3) = str2double(strrep(entries{3},',', '.'));
end
This however takes ~30 seconds for 200 000 lines. This means 150 µs per line. This is really slow. The code is not accepted by parfor.
Now I would like to know what causes the bottleneck in the for loop and how I can speed it up.
Here the measured times:
str2double 578253 calls 29.631s
strsplit 192750 calls 13.388s
EDIT:
The content has this structure in the file
0.000000, -0.00271, 5394147
0.000667, -0.00271, 5394148
0.001333, -0.00271, 5394149
0.002000, -0.00271, 5394150
I think a lot can be improved by calling textscan differently.
You do this:
g = textscan(fid,'%s','delimiter','\r\n');
But then call tdata = g{1};
If textscan is called correctly it should already split all your data, and give it back as numbers.
Try this:
g=textscan(fid,'%f,%f,%f,'delimiter','\r\n')
It should give you back three cell arrays with in the columns your values. To convert to a matrix you can use:
g=cell2mat(g)
I imported 200k lines in 0.12 seconds.
It seems your code has some other workarounds. You start at r=4, it seems you have 3 lines that you don't want to read. so after fopen you can call 3 times
[~] =fgetl(fid)
to get to the interesting part of your file.
You also first split the line with ',' as seperator. But the replace all ',' by '.'. That will not do anything, all ',' are already gone since they were used as seperators.
If you used csvread you wouldn't need to use str2double or strsplit, which you say are the slow lines... it's likely much quicker for a csv.
You would be able to replace all the above code by:
results = csvread(inputfile);
I want to have a list of data in a text file, and for that I use:
fprintf(fid, '%d %s %d\n',ii, names{ii},vals(ii));
the problem in my data, there are names that are longer than other. so I get results in this form:
1 XXY 5
2 NHDMUCY 44
3 LL 96
...
How i can change the fprintf line of code to make the results in this form:
1 XXY 5
2 NHDMUCY 44
3 LL 96
...
Something like this before the start of the loop -
%// extents of each vals string and the corresponding whitespace padding
lens0 = cellfun('length',cellfun(#(x) num2str(x),num2cell(1:numel(names)),'Uni',0))
pad_ws_col1 = max(lens0) - lens0
%// extents of each names string and the corresponding whitespace padding
lens1 = cellfun('length',names)
pad_ws_col2 = max(lens1) - lens1
Then, inside the loop -
fprintf(fid, '%d %s %s %s %d\n',col1(ii), repmat(' ',1,pad_ws_col1(ii)), ...
names{ii},repmat(' ',1,pad_ws_col2(ii)),vals(ii));
Output would be -
1 XXY 5
2 NHDMUCY 44
3 LL 96
For a range 99 - 101, it would be -
99 XXY 5
100 NHDMUCY 44
100 LL 96
Please note that the third column numerals start at a fixed distance instead of ending at a fixed distance from the start of each row as asked in the question. But, assuming that the whole idea of the question was to present the data in a more readable way, this could work for you.
You can use the function char to convert a cell array of string into a character array where all rows will be padded to be the length of the longest one.
So for you:
charNames = char( names ) ;
then you can use fprintf :
fprintf(fid, '%d %s %d\n',ii, charNames(ii,:) , vals(ii)) ;
Just make sure your cell array is a colum before you convert it to char.