I have a comma separated file with 182 rows and 501 columns, of which 500 columns are of type number (features) while the last column are strings (labels).
Example: 182x501 dimension
1,3,4,6,.........7, ABC
4,5,6,4,.........9, XYZ
3,4,5,3,.........2, ABC
How can I load this file so it will have a data set with a matrix, B, containing the number as my features, and a vector, C, containing the strings as my labels?
d = dataset(B, C);
Build a format specifier for textscan based on the number and types of columns, and have it read the file for you.
nNumberCols = 500;
format = [repmat('%f,', [1 nNumberCols]) '%s'];
fid = fopen(file);
x = textscan(fid, format);
fclose(fid);
B = cat(2, x{1:nNumberCols});
C = x{end};
You could use the textscan function. For example:
fid = fopen('test.dat');
% Read numbers and string into a cell array
data = textscan(fid, '%s %s');
% Then extract the numbers and strings into their own cell arrays
nums = data{1};
str = data{2};
% Convert string of numbers to numbers
for i = 1:length(str)
nums{i} = str2num(nums{i}); %#ok<ST2NM>
end
% Finally, convert cell array of numbers to a matrix
nums = cell2mat(nums);
fclose(fid);
Note that I have made a number of assumptions here, based on the file format you have specified. For example, I assume that there are no spaces after the commas following a number, but that there is a space immediately preceding the string at the end of each line.
To can make the above code more flexible by using a more considered format specifier (the second argument to textscan). See the section Basic Conversion Specifiers in the textscan documentation.
For example, if you have the following data in a file named data.txt:
1,3,4,6,7, ABC
4,5,6,4,9, XYZ
3,4,5,3,2, ABC
you can read it into a matrix B and a cell array C using the code
N = 5; % Number of numeric data to read
fid = fopen('data.txt');
B = []; C = {};
while ~feof(fid) % repeat until end of file is reached
b = fscanf(fid, '%f,', N); % read N numeric data separated by a comma
c = fscanf(fid, '%s', 1); % read a string
B = [B, b];
C = [C, c];
end
C
B
fclose(fid);
to give
C =
'ABC' 'XYZ' 'ABC'
B =
1 4 3
3 5 4
4 6 5
6 4 3
7 9 2
Related
I found a function (see below) that creates .csv out of a cell array. I modified it so as to work with a cellArray{z}{g} instead of a cellArray{z, s}. In the resulting .csv file, everything that should be separated into several columns is in one column. So I have only one column with several rows. I don't know how to separate these. In the cellArray{1}{1} for example contains first a String, and then values like 13.4156. How could I enable each cell in cellArray{z}{g} to end up in a separate cell in the. csv file?
function cell2csv(filename, cellArray, delimiter)
% Writes cell array content into a *.csv file.
%
% CELL2CSV(filename,cellArray,delimiter)
%
% filename = Name of the file to save. [ i.e. 'text.csv' ]
% cellarray = Name of the Cell Array where the data is in
% delimiter = seperating sign, normally:',' (default)
%
% by Sylvain Fiedler, KA, 2004
% modified by Rob Kohr, Rutgers, 2005 - changed to english and fixed delimiter
if nargin<3
delimiter = ',';
end
datei = fopen(filename,'w');
for z=1:size(cellArray,1)
for g=1:length(cellArray{z})
var = eval(['cellArray{z}{g}']);
if size(var,1) == 0
var = '';
end
if isnumeric(var) == 1
var = num2str(var);
end
fprintf(datei,var);
end
fprintf(datei,'\n');
end
fclose(datei);
Take a look at the code below:
A = cell(3,1);
for ind1 = 1:3
A{ind1} = {ind1,ind1,'cat'};
end
%% Approach 1 - xlswrite
B = vertcat(A{:});
xlswrite('somefile.xls',B);
%% Approach 2 - printf
% A is 3x1 cell of (1x3 cell) objects
B = cellfun(#(x)[num2str(x) ','], vertcat(A{:}),'UniformOutput',false);
% B is 3x3 cell of (1x1 cell) objects
C = num2cell(B,2);
% C is 1x3 cell of (3x1 cell) objects
fid = fopen('somefile.csv','w');
for ind1 = 1:size(C,1)
fprintf(fid, '%s\n',[C{ind1}{:}]);
end
fclose(fid);
This can be solved by changing fprintf(datei,var); --> fprintf(datei,[var, ',']);
and then when opening the csv just use , as delimiter
Right now I can only write a fixed length data to the text file ,but I need to change to a variable length data.
My code:
fileID = fopen(logfilePathLocal,'at+');
formatSpec = '%s,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d\n';
fprintf(fileID,formatSpec,data{1,:});
fclose(fileID);
You can use %f instead of %d, and specify width and precision, e.g. width 3 and precision 5 would be %3.5f.
For more information on specifier syntax, see the fprintf reference in the matlab documentation.
EDIT: If what you mean instead is that you don't know "how many %d my format-string will end up having", you can construct the format-string manually first (e.g. by concatenaton, or by using sprintf), and then use the resulting string in fprintf, e.g.
N = 5; % number of data, subject to change on each run.
% construct format-string
s = '%s';
for i = 1:N-1; s = [s, '%d, ']; end
s = [s, '%d\n']; % final data point
% use it with fprintf to print your data
fprintf (s, 'The data is: ', data{1,:});
Following the answer from here, you can use the combination of eval and an array of characters to produce the desired format specification alongside the data entry you're looking to print into a file.
clear
clc
fid = fopen('file.txt','at+');
% Generate some random data in a cell
data = {rand(5,1) rand(3,1)};
% Calculate required dimensions for fprintf
% and store the results in file.txt
n = size(data,2);
m = [];
for i=1:n
m = [m size(data{i},1) - 1];
end
% Finally, we will have n = 2, m = [4 2]
% Produce the final dynamic code snippet
fmt = [];
for i=1:n
for j=1:m(i)
fmt = [fmt '%d,'];
end
fmt = [fmt '%d\n'];
eval(['fprintf(fid,fmt,data{' num2str(i) '});']);
fmt = [];
end
An example output which is borrowed from file.txt is:
9.063082e-01,8.796537e-01,8.177606e-01,2.607280e-01,5.943563e-01
2.251259e-02,4.252593e-01,3.127189e-01
I've have a
function [Q,A] = load_test(filename) which is loading in a text file. I would like the function to skip empty lines, but i'm not sure how to do it.
I have tried to use
~isempty(x), ~ischar(x)
but I keep getting an error message. my code so far is:
fid = fopen(filename);
data = textscan(fid, '%s','delimiter','\n');
fclose(fid);
Q = cellfun(#(x) x(1:end-2), data{1}, 'uni',0);
A = cellfun(#(x) x(end) == 'T' || x(end) == 'F' && ~isempty(x),data{1});
what do I need to do ?
Code
%%// Your code
fid = fopen(filename);
data = textscan(fid, '%s','delimiter','\n')
fclose(fid);
%%// Additional code
%%// 1. Remove empty lines
c1 = ~cellfun(#isempty,data{:})
t1 = data{:,:}(c1,:)
%%// 2. Select only the lines that have F or T as end characters
lastInLine = regexp(t1,'.$','match','lineanchors') %%// Get the end characters
%%// Get a binary array of rows that have F or T at the end
c2 = strcmp(vertcat(lastInLine{:}),'F') | strcmp(vertcat(lastInLine{:}),'T')
%%// Finally select those rows/lines
data = {t1(c2,:)}
Please note that I am not sure if you still need Q and A.
A few days ago I posted this question and got the following splendid solution:
fid = fopen('C:\Testes_veloc\test.txt', 'Wt');
fmt = sprintf('%s;%d;%%d;%d;%d;%%d;%%d;%%d;%%.4f \\n',str1,num1,0,2)
[a,b,c,d] = ndgrid(vect2,vect1,day,1:15);
out = sprintf(fmt, [d(:), c(:), b(:), a(:), reshape(permute(MD,[2,1,3,4]),[],1)]');
fprintf(fid, '%s', out);
The variables str1, num1, day, vect1, vect2 and MD are inputs of this function:
str1 is a string 1x1
num1 is an integer 1x1
day is a vector 10x1
vect1 is a vector 7x1
vect2 is a vector 180x1
MD is a 4D matrix (7x180x10x15)
The objective was to have a text file as follows:
result.txt:
RED;12;7;0;2;1;4;7;0.0140
RED;12;7;0;2;2;9;7;0.1484
RED;12;7;0;2;3;7;4;0.1787
RED;12;7;0;2;4;2;6;0.7891
RED;12;7;0;2;5;9;6;0.1160
RED;12;7;0;2;6;9;1;0.9893
However, str1 is not a 1x1 string; it's a vector of names (189000x1), that has the length of the text that I desire. In other words, instead of only 'RED', I have many different others strings. Is it possible to adjust this vectorized loop to this situation?
I tried this (add the str1(:) to the concatenation part), but unsuccessfully:
fmt = sprintf('%%s;%s;%d;%%d;%d;%d;%%d;%%d;%%d;%%.4f \\n',str1,num1,0,2)
out = sprintf(fmt, [str1 (:), d(:), c(:), b(:), a(:), reshape(permute(MD,[2,1,3,4]),[],1)]');
For example, str(1,:) = 'RED'; str(2,:) = 'FHAW'; str(3,:) = 'KI81'; a cell like this.
It fails to concatenate the string to the numbers. Does anyone have solution?
Thanks in advance.
sprintf (like fprintf) fills format fields using the arguments in the order they are provided. If you provide more arguments than the format calls for, these functions repeat the format with the additional functions:
sprintf('%s %i\n', 'a', 1, 'b', 2, 'c', 3)
% returns
ans =
a 1
b 2
c 3
Using Matlab's cell unravelling technique, you can prepare your arguments first, then pass them to the sprintf:
tmp = {'a', 1, 'b', 2, 'c', 3};
sprintf('%s %i\n', tmp{:})
You can get fancier by concatenating cell arrays:
tmp1 = {'a','b','c'};
tmp2 = [1 2 3];
tmp = [tmp1' num2cell(tmp2')]'
sprintf('%s %i\n', tmp{:})
% Returns
tmp =
'a' 'b' 'c'
[1] [2] [3]
ans =
a 1
b 2
c 3
Note that the layout of tmp is transposed of the layout in the format. This is because Matlab reads the data in row-major order, so it will march down rows, then columns, to get the arguments for sprintf.
So, in your scenario, you need to create a large cell array with your arguments, then pass that to sprintf.
fid = fopen('C:\Testes_veloc\test.txt', 'Wt');
fmt = sprintf('%%s;%d;%%d;%d;%d;%%d;%%d;%%d;%%.4f \\n',num1,0,2)
[a,b,c,d] = ndgrid(vect2,vect1,day,1:15);
tmp = [str(:) num2cell([d(:) c(:) b(:) a(:) reshape(permute(MD,[2,1,3,4]),[],1)]'])]';
fprintf(fid, fmt, tmp{:});
fclose(fid);
Perhaps this question has been answered before but I can't seem to find any good documentation on it. So my problem is the following:
Suppose I have two vectors of the same length in Matlab
x = [1;2;3];
and
y = ['A';'B';'C'];
Basically I would like to create the matrix {x,y} (ie 3 rows, 2 columns) and then write it to a .csv file. So in the end I'd like to see a .csv file like
1,A
2,B
3,C
This is just a mocked-up example but really I have 75 columns with each being either a column of strings or numerics. Any suggestions are greatly appreciated!
Actually here is the solution
http://www.mathworks.com/help/matlab/import_export/write-to-delimited-data-files.html#br2ypq2-1
This works much simpler.
If you sort your data into a suitable cell
A = cell(3,2);
A{1,1} = 1;
A{2,1} = 2;
A{3,1} = 3;
A{1,2} = 'A';
A{2,2} = 'B';
A{3,2} = 'C';
you may then call this function:
cell2csv(filename,A)
function cell2csv(filename,cellArray,delimiter)
% Writes cell array content into a *.csv file.
%
% CELL2CSV(filename,cellArray,delimiter)
%
% filename = Name of the file to save. [ i.e. 'text.csv' ]
% cellarray = Name of the Cell Array where the data is in
% delimiter = seperating sign, normally:',' (default)
%
% by Sylvain Fiedler, KA, 2004
% modified by Rob Kohr, Rutgers, 2005 - changed to english and fixed delimiter
if nargin<3
delimiter = ',';
end
datei = fopen(filename,'w');
for z=1:size(cellArray,1)
for s=1:size(cellArray,2)
var = eval(['cellArray{z,s}']);
if size(var,1) == 0
var = '';
end
if isnumeric(var) == 1
var = num2str(var);
end
fprintf(datei,var);
if s ~= size(cellArray,2)
fprintf(datei,[delimiter]);
end
end
fprintf(datei,'\n');
end
fclose(datei);