Matlab, read multiple 2d arrays from a csv file - matlab

I have a csv file which contains 2d arrays of 4 columns but a varying number of rows. Eg:
2, 354, 23, 101
3, 1023, 43, 454
1, 5463, 45, 7657
4, 543, 543, 654
3, 56, 7654, 344
...
I need to be able to import the data such that I can run operations on each block of data, however csvread, dlmread and textscan all ignore the blank lines.
I can't seem to find a solution anywhere, how can this be done?
PS:
It may be worth noting that the files of the format above are actually the concatenation of many files containing only one block of data (I don't want to have to read from thousands of files every time) therefore the blank line between blocks can be changed to any other delimiter / marker. This is just done with a python script.
EDIT: My Solution - based upon / inspired by petrichor below
I replaced the csvread with textscan which is faster. Then I realised that if I replaced the blank lines with lines of nan instead (modifying my python script) I could remove the need for a second textscan the slow point. My code is:
filename = 'data.csv';
fid = fopen(filename);
allData = cell2mat(textscan(fid,'%f %f %f %f','delimiter',','));
fclose(fid);
nanLines = find(isnan(allData(:,1)))';
iEnd = (nanLines - (1:length(nanLines)));
iStart = [1 (nanLines(1:end-1) - (0:length(nanLines)-2))];
nRows = iEnd - iStart + 1;
allData(nanLines,:)=[];
data = mat2cell(allData, nRows);
Which evaluates in 0.28s (a file of just of 103000 lines). I've accepted petrichor's solution as it indeed best solves my initial problem.

filename = 'data.txt';
%# Read all the data
allData = csvread(filename);
%# Compute the empty line indices
fid = fopen(filename);
lines = textscan(fid, '%s', 'Delimiter', '\n');
fclose(fid);
blankLines = find(cellfun('isempty', lines{1}))';
%# Find the indices to separate data into cells from the whole matrix
iEnd = [blankLines - (1:length(blankLines)) size(allData,1)];
iStart = [1 (blankLines - (0:length(blankLines)-1))];
nRows = iEnd - iStart + 1;
%# Put the data into cells
data = mat2cell(allData, nRows)
That gives the following for your data:
data =
[3x4 double]
[2x4 double]

Related

How to read a single character in file using MATLAB?

In my file data.txt, I have a string abcdefgh. Now I want to take just 1 character without read whole string. How can I do this in MATLAB?
For example, I want to take the first character, I use c = fscanf(data.txt, '%c'); and c = textscan(data.txt, '%c'); but it read whole line in data.txt. I know that c(1) is my answer but I don't want to do that.
You can limit the number of characters that are read in using the third input to either fscanf or textscan.
fid = fopen('data.txt', 'r');
c = fscanf(fid, '%c', 1);
c = textscan(fid, '%c', 1);
You could also just use a lower-level function such as fread to do this.
fid = fopen('data.txt', 'r');
c = fread(fid, 1, '*char');

Appending data to a file in Matlab, removing before a symbol

I have a file which is written via Matlab from a vector M with binary data values. This file is written with Matlab's fwrite in the following script myGenFile.m of the form function myGenFile(fName, M):
% open output file
fId = fopen(fName, 'W');
% start by writing some things to the file
fprintf(fId, '{DATA BITLENGTH:%d}', length(M));
fprintf(fId, '{DATA LIST-%d:#', ceil(length(M) / 8) + 1);
% pad to full bytes
lenRest = mod(length(M), 8);
M = [M, zeros(1, 8 - lenRest)];
% reverse order in bytes
M = reshape(M, 8, ceil(length(M) / 8));
MReversed = zeros(8, ceil(length(M) / 8));
for i = 1:8
MReversed(i,:) = M(9-i,:);
end
MM = reshape(MReversed, 1, 8*len8);
fwrite(fId, MM, 'ubit1');
% write some ending of the file
fprintf(fId, '}');
fclose(fId);
Now I want to write a file myAppendFile.m, which appends some values to the existing file and has the following form: function myAppendFile(newData, fName). To do this I will have to remove the trailing '}':
fId = fopen(nameFile,'r');
oldData = textscan(fId, '%s', 'Delimiter', '\n');
% remove the last character of the file; aka the ending '}'
oldData{end}{end} = oldData{end}{end}(1:end-1);
The problem is now when trying to write oldData into the file (writing newData should be trivial, since it is also a vector of binary data like M), since it is a cell of cell arrays, containing strings.
How could I overcome this issue and append the new data correctly?
Instead of using textscan which copies the file to your memory, then writes it back, you could use fseek to set the pointer where you want to continue writing. Just put it one char before end of file and continue writing.
fseek(fid, -1, 'eof');

Reading CSV with mixed type data

I need to read the following csv file in MATLAB:
2009-04-29 01:01:42.000;16271.1;16271.1
2009-04-29 02:01:42.000;2.5;16273.6
2009-04-29 03:01:42.000;2.599609;16276.2
2009-04-29 04:01:42.000;2.5;16278.7
...
I'd like to have three columns:
timestamp;value1;value2
I tried the approaches described here:
Reading date and time from CSV file in MATLAB
modified as:
filename = 'prova.csv';
fid = fopen(filename, 'rt');
a = textscan(fid, '%s %f %f', ...
'Delimiter',';', 'CollectOutput',1);
fclose(fid);
But it returs a 1x2 cell, whose first element is a{1}='ÿþ2', the other are empty.
I had also tried to adapt to my case the answers to these questions:
importing data with time in MATLAB
Read data files with specific format in matlab and convert date to matal serial time
but I didn't succeed.
How can I import that csv file?
EDIT After the answer of #macduff i try to copy-paste in a new file the data reported above and use:
a = textscan(fid, '%s %f %f','Delimiter',';');
and it works.
Unfortunately that didn't solve the problem because I have to process csv files generated automatically, which seems to be the cause of the strange MATLAB behavior.
What about trying:
a = textscan(fid, '%s %f %f','Delimiter',';');
For me I get:
a =
{4x1 cell} [4x1 double] [4x1 double]
So each element of a corresponds to a column in your csv file. Is this what you need?
Thanks!
Seems you're going about it the right way. The example you provide poses no problems here, I get the output you desire. What's in the 1x2 cell?
If I were you I'd try again with a smaller subset of the file, say 10 lines, and see if the output changes. If yes, then try 100 lines, etc., until you find where the 4x1 cell + 4x2 array breaks down into the 1x2 cell. It might be that there's an empty line or a single empty field or whatever, which forces textscan to collect data in an additional level of cells.
Note that 'CollectOutput',1 will collect the last two columns into a single array, so you'll end up with 1 cell array of 4x1 containing strings, and 1 array of 4x2 containing doubles. Is that indeed what you want? Otherwise, see #macduff's post.
I've had to parse large files like this, and I found I didn't like textscan for this job. I just use a basic while loop to parse the file, and I use datevec to extract the timestamp components into a 6-element time vector.
%% Optional: initialize for speed if you have large files
n = 1000 %% <# of rows in file - if known>
timestamp = zeros(n,6);
value1 = zeros(n,1);
value2 = zeros(n,1);
fid = fopen(fname, 'rt');
if fid < 0
error('Error opening file %s\n', fname); % exit point
end
cntr = 0
while true
tline = fgetl(fid); %% get one line
if ~ischar(tline), break; end; % break out of loop at end of file
cntr = cntr + 1;
splitLine = strsplit(tline, ';'); %% split the line on ; delimiters
timestamp(cntr,:) = datevec(splitLine{1}, 'yyyy-mm-dd HH:MM:SS.FFF'); %% using datevec to parse time gives you a standard timestamp vector
value1(cntr) = splitLine{2};
value2(cntr) = splitLine{3};
end
%% Concatenate at the end if you like
result = [timestamp value1 value2];

How can I append a cell array to a .txt file?

I previously queried about including matrices and strings in a .txt file. I now need to append cells to it. From my prior question:
str = 'This is the matrix: ';
mat1 = [23 46; 56 67];
fName = 'output.txt';
fid = fopen(fName, 'w');
if fid >= 0
fprintf(fid, '%s\n', str);
fclose(fid);
end
dlmwrite(fName, mat1, '-append', 'newline', 'pc', 'delimiter', '\t');
Now I want to append a string: 'The removed identifiers are' and then this cell array below it:
'ABC' [10011] [2]
'DEF' [10023] [1]
Some relevant links:
http://www.mathworks.com/help/techdoc/ref/fileformats.html, http://www.mathworks.com/support/solutions/en/data/1-1CCMDO/index.html?solution=1-1CCMDO
Unfortunately, you can't use functions like DLMWRITE or CSVWRITE for writing cell arrays of data. However, to get the output you want you can still use a single call to FPRINTF, but you will have to specify the format of all the entries in a row of your cell array. Building on my answer to your previous question, you would add these additional lines:
str = 'The removed identifiers are: '; %# Your new string
cMat = {'ABC' 10011 2; 'DEF' 10023 1}; %# Your cell array
fid = fopen(fName,'a'); %# Open the file for appending
fprintf(fid,'%s\r\n',str); %# Print the string
cMat = cMat.'; %'# Transpose cMat
fprintf(fid,'%s\t%d\t%d\r\n',cMat{:}); %# Print the cell data
fclose(fid); %# Close the file
And the new file contents (including the old example) will look like this:
This is the matrix:
23 46
56 67
The removed identifiers are:
ABC 10011 2
DEF 10023 1
You may use cellwrite from File Exchange. Reading Writing Mixed Data With MATLAB from Francis Barnhart, the creator of cellwrite might be worth a look.
It should be a feasible task, to change cellwrite's signature to accept a file handle. Allowing to append data to an already existing file.

Matlab: how handle abnormal data files

I am trying to import a large number of files into Matlab for processing. A typical file would look like this:
mass intensity
350.85777 238
350.89252 3094
350.98688 2762
351.87899 468
352.17712 569
352.28449 426
Some text and numbers here, describing the experimental setup, eg
Scan 3763 # 81.95, contains 1000 points:
The numbers in the two columns are separated by 8 spaces. However, sometimes the experiment will go wrong and the machine will produce a datafile like this one:
mass intensity
Some text and numbers here, describing the experimental setup, eg
Scan 3763 # 81.95, contains 1000 points:
I found that using space-separated files with a single header row, ie
importdata(path_to_file,' ', 1);
works best for the normal files. However, it totally fails on all the abnormal files. What would the easiest way to fix this be? Should I stick with importdata (already tried all possible settings, it just doesn't work) or should I try writing my own parser? Ideally, I would like to get those values in a Nx2 matrix for normal files and [0 0] for abnormal files.
Thanks.
I don't think you need to create your own parser, nor is this all that abnormal. Using textscan is your best option here.
fid = fopen('input.txt', 'rt');
data = textscan(fid, '%f %u', 'Headerlines', 1);
fclose(fid);
mass = data{1};
intensity = data{2};
Yields:
mass =
350.8578
350.8925
350.9869
351.8790
352.1771
352.2845
intensity =
238
3094
2762
468
569
426
For your 1st file and:
mass =
Empty matrix: 0-by-1
intensity =
Empty matrix: 0-by-1
For your empty one.
By default, text scan reads whitespace as a delimiter, and it only reads what you tell it to until it can no longer do so; thus it ignores the final lines in your file. You can also run a second textscan after this one if you want to pick up those additional fields:
fid = fopen('input.txt', 'rt');
data = textscan(fid, '%f %u', 'Headerlines', 1);
mass = data{1};
intensity = data{2};
data = textscan(fid, '%*s %u %*c %f %*c %*s %u %*s', 'Headerlines', 1);
scan = data{1};
level = data{2};
points = data{3};
fclose(fid);
Along with your mass and intensity data gives:
scan =
3763
level =
81.9500
points =
1000
what do you mean 'totally failes on abnormal files'?
you can check if importdata finds any data using e.g.
>> imported = importdata(path_to_file,' ', 1);
>> isfield(imported, 'data')