How can I append a cell array to a .txt file? - matlab

I previously queried about including matrices and strings in a .txt file. I now need to append cells to it. From my prior question:
str = 'This is the matrix: ';
mat1 = [23 46; 56 67];
fName = 'output.txt';
fid = fopen(fName, 'w');
if fid >= 0
fprintf(fid, '%s\n', str);
fclose(fid);
end
dlmwrite(fName, mat1, '-append', 'newline', 'pc', 'delimiter', '\t');
Now I want to append a string: 'The removed identifiers are' and then this cell array below it:
'ABC' [10011] [2]
'DEF' [10023] [1]
Some relevant links:
http://www.mathworks.com/help/techdoc/ref/fileformats.html, http://www.mathworks.com/support/solutions/en/data/1-1CCMDO/index.html?solution=1-1CCMDO

Unfortunately, you can't use functions like DLMWRITE or CSVWRITE for writing cell arrays of data. However, to get the output you want you can still use a single call to FPRINTF, but you will have to specify the format of all the entries in a row of your cell array. Building on my answer to your previous question, you would add these additional lines:
str = 'The removed identifiers are: '; %# Your new string
cMat = {'ABC' 10011 2; 'DEF' 10023 1}; %# Your cell array
fid = fopen(fName,'a'); %# Open the file for appending
fprintf(fid,'%s\r\n',str); %# Print the string
cMat = cMat.'; %'# Transpose cMat
fprintf(fid,'%s\t%d\t%d\r\n',cMat{:}); %# Print the cell data
fclose(fid); %# Close the file
And the new file contents (including the old example) will look like this:
This is the matrix:
23 46
56 67
The removed identifiers are:
ABC 10011 2
DEF 10023 1

You may use cellwrite from File Exchange. Reading Writing Mixed Data With MATLAB from Francis Barnhart, the creator of cellwrite might be worth a look.
It should be a feasible task, to change cellwrite's signature to accept a file handle. Allowing to append data to an already existing file.

Related

Matlab - string containing a number and equal sign

I have a data file that contains parameter names and values with an equal sign in between them. It's like this:
A = 1234
B = 1353.335
C =
D = 1
There is always one space before and after the equal sign. The problem is some variables don't have values assigned to them like "C" above and I need to weed them out.
I want to read the data file (text) into a cell and just remove the lines with those invalid statements or just create a new data file without them.
Whichever is easier, but I will eventually read the file into a cell with textscan command.
The values (numbers) will be treated as double precision.
Please, help.
Thank you,
Eric
Try this:
fid = fopen('file.txt'); %// open file
x = textscan(fid, '%s', 'delimiter', '\n'); %// or '\r'. Read each line into a cell
fclose(fid); %// close file
x = x{1}; %// each cell of x contains a line of the file
ind = ~cellfun(#isempty, regexp(x, '=\s[\d\.]+$')); %// desired lines: space, numbers, end
x = x(ind); %// keep only those lines
If you just want to get the variables, and reject lines that do not have any character, this might work (the data.txt is just a txt generated by the example of data you have given):
fid = fopen('data.txt');
tline = fgets(fid);
while ischar(tline)
tmp = cell2mat(regexp(tline,'\=(.*)','match'));
b=str2double(tmp(2:end));
if ~isnan(b)
disp(b)
end
tline = fgets(fid);
end
fclose(fid);
I am reading the txt file line by line, and using general expressions to get rid of useless chars, and then converting to double the value read.

Text Scanning to read in unknown number of variables and unknown number of runs

I am trying to read in a csv file which will have the format
Var1 Val1A Val1B ... Val1Q
Var2 Val2A Val2B ... Val2Q
...
And I will not know ahead of time how many variables (rows) or how many runs (columns) will be in the file.
I have been trying to get text scan to work but no matter what I try I cannot get either all the variable names isolated or a rows by columns cell array. This is what I've been trying.
fID = fopen(strcat(pwd,'/',inputFile),'rt');
if fID == -1
disp('Could not find file')
return
end
vars = textscan(fID, '%s,%*s','delimiter','\n');
fclose(fID);
Does anyone have a suggestion?
If the file has the same number of columns in each row (you just don't know how many to begin with), try the following.
First, figure out how many columns by parsing just the first row and find the number of columns, then parse the full file:
% Open the file, get the first line
fid = fopen('myfile.txt');
line = fgetl(fid);
fclose(fid);
tmp = textscan(line, '%s');
% The length of tmp will tell you how many lines
n = length(tmp);
% Now scan the file
fid = fopen('myfile.txt');
tmp = textscan(fid, repmat('%s ', [1, n]));
fclose(fid);
For any given file, are all the lines equal length? If they are, you could start by reading in the first line and use that to count the number of fields and then use textscan to read in the file.
fID = fopen(strcat(pwd,'/',inputFile),'rt');
firstLine = fgetl(fID);
numFields = length(strfind(firstLine,' ')) + 1;
fclose(fID);
formatString = repmat('%s',1,numFields);
fID = fopen(strcat(pwd,'/',inputFile),'rt');
vars = textscan(fID, formatString,' ');
fclose(fID);
Now you will have a cell array where first entry are the var names and all the other entries are the observations.
In this case I assumed the delimiter was space even though you said it was a csv file. If it is really commas, you can change the code accordingly.

Reading CSV with mixed type data

I need to read the following csv file in MATLAB:
2009-04-29 01:01:42.000;16271.1;16271.1
2009-04-29 02:01:42.000;2.5;16273.6
2009-04-29 03:01:42.000;2.599609;16276.2
2009-04-29 04:01:42.000;2.5;16278.7
...
I'd like to have three columns:
timestamp;value1;value2
I tried the approaches described here:
Reading date and time from CSV file in MATLAB
modified as:
filename = 'prova.csv';
fid = fopen(filename, 'rt');
a = textscan(fid, '%s %f %f', ...
'Delimiter',';', 'CollectOutput',1);
fclose(fid);
But it returs a 1x2 cell, whose first element is a{1}='ÿþ2', the other are empty.
I had also tried to adapt to my case the answers to these questions:
importing data with time in MATLAB
Read data files with specific format in matlab and convert date to matal serial time
but I didn't succeed.
How can I import that csv file?
EDIT After the answer of #macduff i try to copy-paste in a new file the data reported above and use:
a = textscan(fid, '%s %f %f','Delimiter',';');
and it works.
Unfortunately that didn't solve the problem because I have to process csv files generated automatically, which seems to be the cause of the strange MATLAB behavior.
What about trying:
a = textscan(fid, '%s %f %f','Delimiter',';');
For me I get:
a =
{4x1 cell} [4x1 double] [4x1 double]
So each element of a corresponds to a column in your csv file. Is this what you need?
Thanks!
Seems you're going about it the right way. The example you provide poses no problems here, I get the output you desire. What's in the 1x2 cell?
If I were you I'd try again with a smaller subset of the file, say 10 lines, and see if the output changes. If yes, then try 100 lines, etc., until you find where the 4x1 cell + 4x2 array breaks down into the 1x2 cell. It might be that there's an empty line or a single empty field or whatever, which forces textscan to collect data in an additional level of cells.
Note that 'CollectOutput',1 will collect the last two columns into a single array, so you'll end up with 1 cell array of 4x1 containing strings, and 1 array of 4x2 containing doubles. Is that indeed what you want? Otherwise, see #macduff's post.
I've had to parse large files like this, and I found I didn't like textscan for this job. I just use a basic while loop to parse the file, and I use datevec to extract the timestamp components into a 6-element time vector.
%% Optional: initialize for speed if you have large files
n = 1000 %% <# of rows in file - if known>
timestamp = zeros(n,6);
value1 = zeros(n,1);
value2 = zeros(n,1);
fid = fopen(fname, 'rt');
if fid < 0
error('Error opening file %s\n', fname); % exit point
end
cntr = 0
while true
tline = fgetl(fid); %% get one line
if ~ischar(tline), break; end; % break out of loop at end of file
cntr = cntr + 1;
splitLine = strsplit(tline, ';'); %% split the line on ; delimiters
timestamp(cntr,:) = datevec(splitLine{1}, 'yyyy-mm-dd HH:MM:SS.FFF'); %% using datevec to parse time gives you a standard timestamp vector
value1(cntr) = splitLine{2};
value2(cntr) = splitLine{3};
end
%% Concatenate at the end if you like
result = [timestamp value1 value2];

MATLAB: Convert comma separated single cell to multiple cell array whilst maintaining UTF-8 encoding using textscan

From the beginning.
I have data in a csv file like:
La Loi des rues,/m/0gw3lmk,/m/0gw1pvm
L'Étudiante,/m/0j9vjq5,/m/0h6hft_
The Kid From Borneo,/m/04lrdnn,/m/04lrdnt,/m/04lrdn5,/m/04lrdnh,/m/04lrdnb
etc.
This is in UTF-8 format. I import this file as follows (taken from somewhere else):
feature('DefaultCharacterSet','UTF-8');
fid = fopen(filename,'rt'); %# Open the file
lineArray = cell(100,1); %# Preallocate a cell array (ideally slightly
%# larger than is needed)
lineIndex = 1; %# Index of cell to place the next line in
nextLine = fgetl(fid); %# Read the first line from the file
while ~isequal(nextLine,-1) %# Loop while not at the end of the file
lineArray{lineIndex} = nextLine; %# Add the line to the cell array
lineIndex = lineIndex+1; %# Increment the line index
nextLine = fgetl(fid); %# Read the next line from the file
end
fclose(fid); %# Close the file
This makes an array with the UTF-8 text within it. {3x1} array:
'La Loi des rues,/m/0gw3lmk,/m/0gw1pvm'
'L''Étudiante,/m/0j9vjq5,/m/0h6hft_'
'The Kid From Borneo,/m/04lrdnn,/m/04lrdnt,/m/04lrdn5,/m/04lrdnh,/m/04lrdnb'
Now the next part separates each value into an array:
lineArray = lineArray(1:lineIndex-1); %# Remove empty cells, if needed
for iLine = 1:lineIndex-1 %# Loop over lines
lineData = textscan(lineArray{iLine},'%s',... %# Read strings
'Delimiter',',');
lineData = lineData{1}; %# Remove cell encapsulation
if strcmp(lineArray{iLine}(end),',') %# Account for when the line
lineData{end+1} = ''; %# ends with a delimiter
end
lineArray(iLine,1:numel(lineData)) = lineData; %# Overwrite line data
end
This outputs:
'La Loi des rues' '/m/0gw3lmk' '/m/0gw1pvm' [] [] []
'L''�tudiante' '/m/0j9vjq5' '/m/0h6hft_' [] [] []
'The Kid From Borneo' '/m/04lrdnn' '/m/04lrdnt' '/m/04lrdn5' '/m/04lrdnh' '/m/04lrdnb'
The problem is that the UTF-8 encoding is lost on the textscan (note the question mark I now get whereas it was fine in the previous array).
Question: How do I maintain the UTF-8 coding when it translates the {3x1} array into a 3xN array.
I can't find anything on how to keep UTF-8 encoding in a textscan of an array already in the workspace. Everything is to do with importing a text file which I have no problems with - it is the second step.
Thanks!
Try the following code:
%# read whole file as a UTF-8 string
fid = fopen('utf8.csv', 'rb');
b = fread(fid, '*uint8')';
str = native2unicode(b, 'UTF-8');
fclose(fid);
%# split into lines
lines = textscan(str, '%s', 'Delimiter','', 'Whitespace','\n');
lines = lines{1};
%# split each line into values
C = cell(numel(lines),6);
for i=1:numel(lines)
vals = textscan(lines{i}, '%s', 'Delimiter',',');
vals = vals{1};
C(i,1:numel(vals)) = vals;
end
The result:
>> C
C =
'La Loi des rues' '/m/0gw3lmk' '/m/0gw1pvm' [] [] []
'L'Étudiante' '/m/0j9vjq5' '/m/0h6hft_' [] [] []
'The Kid From Borneo' '/m/04lrdnn' '/m/04lrdnt' '/m/04lrdn5' '/m/04lrdnh' '/m/04lrdnb'
Note that when I tested this, I encoded the input CSV file as "UTF-8 without BOM" (I was using Notepad++ as editor)
Try using the following fopen command instead of the one you currently are. It specifies UTF-8 encoding for the file.
f = fopen(filename,'rt', 'UTF-8');
You can probably shorten up some of the code using this as well:
text = fscanf(f,'%c');
Lines = textscan(text,'%s','Delimiter',',');
That might help alleviate some of the pre-allocation that you're doing there.

How to convert Matlab variables to .dat (text) file with headers

EDITED QUESTION:
I have 2500 rows x 100 columns data in variable named avg_data_models. I also have 2500 rows x 100 columns variable 'X' and similar size matrix variable 'Y', both containing the co-ordinates. I want to save the values of this variable in a text (.dat) file which must have 302 header lines in the following manner:
avg_data_models
300
X_1
X_2
.
.
.
X_100
Y_1
Y_2
.
.
.
Y_100
avg_data_models_1
avg_data_models_2
avg_data_models_3
.
.
.
.
.
avg_data_models_100
In the above header style, the first line is the name of the file, the 2nd line tells the number of columns (each column has 2500 rows), and the rest of the 300 lines represent the model of each variable respectively - Like 100 models of X, 100 models of Y and 100 models of avg_data_models.
Consider this code:
%# here you have your data X/Y/..
%#X = rand(2500,100);
[r c] = size(X);
prefixX = 'X';
prefixY = 'Y';
prefixData = 'avg_data_models';
%# build a cell array that contains all the header lines
num = strtrim( cellstr(num2str((1:c)','_%d')) ); %#' SO fix
headers = [ prefixData ;
num2str(3*c) ;
strcat(prefixX,num) ;
strcat(prefixY,num) ;
strcat(prefixData,num) ];
%# write to file
fid = fopen('outputFile.dat', 'wt');
fprintf(fid, '%s\n',headers{:});
fclose(fid);
EDIT
It seems I misunderstood the question.. Here's the code to write the actual data (not the header titles!):
%# here you have your data X/Y/..
avg_data_models = rand(2500,100);
X = rand(2500,100);
Y = rand(2500,100);
%# create file, and write the title and number of columns
fid = fopen('outputFile.dat', 'wt');
fprintf(fid, '%s\n%d\n', 'avg_data_models', 3*size(X,2));
fclose(fid);
%# append rest of data
dlmwrite('outputFile.dat', [X Y avg_data_models], '-append', 'delimiter',',')
Note: I used a comma , as delimiter, you can change it to be a space or a tab \t if you like..
You can use fprintf to write the header, like so:
%# define the number of data
nModels = 100;
dataName = 'avg_data_models';
%# open the file
fid = fopen('output.dat','w');
%# start writing. First line: title
fprintf(fid,'%s\n',dataName); %# don't forget \n for newline. Use \n\r if yow want to open this in notepad
%# write number of models
fprintf(fid,'%i\n',nModels)
%# loop to write the rest of the header
for iModel = 1:nModels
fprintf(fid,'%s_%i\n',dataName,iModel);
end
%# use your favorite method to write the rest of the data.
%# for example, you could use fprintf again, using /t to add tabs
%# create format-string
%# check the help to fprintf to learn about formatting details
formatString = repmat('%f\t',1,100);
formatString = [formatString(1:end-1),'n']; %# replace last tab with newline
%# transpose the array, because fprintf reshapes the array to a vector and
%# 'fills' the format-strings sequentially until it runs out of data
fprintf(fid,formatString,avg_data'); %'# SO formatting
%# close the file
fclose(fid);