How to load data for Classification in Matlab - matlab

I have a text file containing thousands of attributes (each column indicates an attribute) and a column that shows the labels of each row.All data is numeric except the last column which is the labels. This column is string. I want to use matlab classification functions such as gscatter() to classify the data. The problem is that when I use load filename in matlab to load my data I get this error (in which "no" is one of the lables)
Unknown text on line number 1 of ASCII file C:\Program Files\MATLAB\R2011b\train\train.txt
"no".
In fact I do not know how to load my data in matlab to be able to use matlab functions to classify the data.

Load is only for .mat files and text files with only numeric data, which is why you get an error.
There are a number of functions which do read text files though.
Depending on the format of your data files, you could use one of the following:
textread is pretty general but requires you to supply the format and to open and close the file.
csvread reads only numeric, comma-separated value, but you don't have to provide a format.
importdata is very general and convenient
fscanf is similar to textread
Given the number of attributes you're dealing with, I'd definitely go with importdata myself.

Here is an example
train.txt
1,2,3,4,5,6,no
2,3,4,5,6,7,yes
myLoadScript.m
numAttribs = 6; %# number of attributes (excluding the label)
frmt = [repmat('%f ',1,numAttribs) '%s'];
fid = fopen('train.txt', 'rt');
C = textscan(fid, frmt, 'Delimiter',',', 'CollectOutput',1);
fclose(fid);
The result:
>> C{1}
ans =
1 2 3 4 5 6
2 3 4 5 6 7
>> C{2}
ans =
'no'
'yes'
Should be easy to adapt to work on your specific file format...

Related

How can I import this data and export it to an excel file?

I have 672 samples like these in a .txt file:
{
sleep:1360.36,
eat:4.36,
live:16.37,
travel:22.18,
work:22,
school:0.84,
vt:386.87
},
I want to put them in an excel file where {sleep, eat, live, travel, work, school, vt} are represented in a row and each sample represented in columns, with the correspondent number matching each. I've never dealt with text files following this format on matlab so I have no idea how to do this. Can anyone help me?
You can import data from Excel into Matlab using xlsread and export data using xlswrite. See the documentation
Syntax
xlswrite(filename,A,sheet,xlRange)
where A might be a cell array where the cells contain number of strings, sheet is the name of the Excel sheet and xlRange is the range in the Excel sheet (example: A1:B5).
Code example:
A = {'Column1', 'Column2', 'Column3'; 1, 2, 3};
xlswrite('example.xls', A, 'ExampleSheet', 'A1:B3');
Some hints:
If you know the number of rows and columsn of your data only at runtime but still want to give a range you must somehow assemble the range string yourself (rows are easy with sprintf, column names are more difficult (A, B, C, .., Z, AA, AB, ...))
If you do not have Excel on your computer, you will get csv files (see documentation)
Although each call to xlswrite returns quite fast, the system is still working. If another call to xlswrite comes too soon you might get unexpected (delay dependent) errors with no way to avoid them then to wait for sufficient time. I usually collect my data and then write everything to an excel file in one go.
very possible, you can do it in Matlab if you are familiar with it (although it is also quite easy to do in excel). To load in your file (no need to convert it, Matlab reads txt Files). you can do something like:
fileID = fopen('test2.txt'); %//Your file name
Input = textscan(fileID,'%s %f','Delimiter',':');
GoodRows = find(~isnan(Input{2} ));
column1 = Input{1}(GoodRows,:); //Column 1 is in the format of Cells (since you are storing Strings
column2 = Input{2}(GoodRows,:); //Column 2 is in the format of a Matrix, which will allow you to take numbers and do averages etc.
The cell and the matrix share indexes, so you can reformat your data eventually in to a Cell and export it in Matlab.
column1 =
'sleep'
'eat'
'live'
'travel'
'work'
'school'
'vt'
column2 =
1.0e+003 *
1.3604
0.0044
0.0164
0.0222
0.0220
0.0008
0.3869
==============EDIT===============
If you have multiple columns after the String, i.e.:
sleep,1.5,1.4,1.3
If you want to keep using textscan, you will need to specify how many columns there are. This is done by either:
Input = textscan(fileID,'%s %f %f %f %f','Delimiter',':'); //% add %f for each numeric column.
Or
Input = textscan(fileID,'%s repmat('%f',[1,N])]','Delimiter',':'); %// where N is the number of numeric columns you have.

Import text file in MATLAB

I have a tab delimited text file with suffix .RAW.
How can I load the data from the file into a matrix in MATLAB?
I have found readtable, but it doesn't support files ending with suffix .RAW.
Do I really have to use fread, fscanf, etc. to simply load a text file into a matrix?
You can use the dlmread() function. It will read data from an ASCII text file into a matrix and let you define the delimiter yourself. The delimiter for tabs is '\t'.
>> M = dlmread('Data.raw', '\t')
M =
1 2 3
4 5 6
7 8 9
Just for your information there is also the tdfread() function but I do not recommend using it except in very specific cases. dlmread() is a much better option.
.RAW is a generic file extention. You should know the format of your RAW file (especially if your file contains a combination of numbers, data structures etc). If it is a simple text file with a single 2D table, you can easily read it with fscanf, fread, fgetl, fgets, etc
Here is a simple example for a 2D table (matrix):
Let's assume that each row of your table is separated by a carriage return from its following rows. We can read each row by fgetl() and then extract numbers using str2num().
fid=fopen('YourTextFile.RAW');
Data=[];
i = 0;
while 1
i = i + 1;
tline = fgetl(fid);
if ~ischar(tline), break, end
Data(i,:) = str2num(tline);
end
fclose(fid);
disp(Data)
For more complex data structure, the code should be changed.
For a 2D table (a special case) the above simple code can be easily exchanged by dlmread() function.

Problem concatenating a matrix of numbers with a vector of strings (column labels) using cell2mat

I'm a Mac user (10.6.8) using MATLAB to process calculation results. I output large tables of numbers to .csv files. I then use the .csv files in EXCEL. This all works fine.
The problem is that each column of numbers needs a label (a string header). I can't figure out how to concatenate labels to the table of numbers. I would very much appreciate any advice. Here is some further information that might be useful:
My labels are contained within a cell array:
columnsHeader = cell(1,15)
that I fill in with calculation results; for example:
columnsHeader{1} = propertyStringOne (where propertyStringOne = 'Liq')
The sequence of labels is different for each calculation. My first attempt was to try and concatenate the labels directly:
labelledNumbersTable=cat(1,columnsHeader,numbersTable)
I received an error that concatenated types need to be the same. So I tried converting the labels/strings using cell2mat:
columnsHeader = cell2mat(columnsHeader);
labelledNumbersTable = cat(1,columnsHeader,numbersTable)
But that took ALL the separate labels and made them into one long word... Which leads to:
??? Error using ==> cat
CAT arguments dimensions are not consistent.
Does anyone know of an alternative method that would allow me to keep my original cell array of labels?
You will have to handle writing the column headers and the numeric data to the file in two different ways. Outputting your cell array of strings will have to be done using the FPRINTF function, as described in this documentation for exporting cell arrays to text files. You can then output your numeric data by appending it to the file (which already contains the column headers) using the function DLMWRITE. Here's an example:
fid = fopen('myfile.csv','w'); %# Open the file
fprintf(fid,'%s,',columnsHeader{1:end-1}); %# Write all but the last label
fprintf(fid,'%s\n',columnsHeader{end}); %# Write the last label and a newline
fclose(fid); %# Close the file
dlmwrite('myfile.csv',numbersTable,'-append'); %# Append your numeric data
The solution to the problem is already shown by others. I am sharing a slightly different solution that improves performance especially when trying to export large datasets as CSV files.
Instead of using DLMWRITE to write the numeric data (which internally uses a for-loop over each row of the matrix), you can directly call FPRINTF to write the whole thing at once. You can see a significant improvement if the data has many rows.
Example to illustrate the difference:
%# some random data with column headers
M = rand(100000,5); %# 100K rows, 5 cols
H = strtrim(cellstr( num2str((1:size(M,2))','Col%d') )); %'# headers
%# FPRINTF
tic
fid = fopen('a.csv','w');
fprintf(fid,'%s,',H{1:end-1});
fprintf(fid,'%s\n',H{end});
fprintf(fid, [repmat('%.5g,',1,size(M,2)-1) '%.5g\n'], M'); %'# default prec=5
fclose(fid);
toc
%# DLMWRITE
tic
fid = fopen('b.csv','w');
fprintf(fid,'%s,',H{1:end-1});
fprintf(fid,'%s\n',H{end});
fclose(fid);
dlmwrite('b.csv', M, '-append');
toc
The timings on my machine were as follows:
Elapsed time is 0.786070 seconds. %# FPRINTF
Elapsed time is 6.285136 seconds. %# DLMWRITE

Fastest way to import CSV files in MATLAB

I've written a script that saves its output to a CSV file for later reference, but the second script for importing the data takes an ungainly amount of time to read it back in.
The data is in the following format:
Item1,val1,val2,val3
Item2,val4,val5,val6,val7
Item3,val8,val9
where the headers are on the left-most column, and the data values take up the remainder of the row. One major difficulty is that the arrays of data values can be different lengths for each test item. I'd save it as a structure, but I need to be able to edit it outside the MATLAB environment, since sometimes I have to delete rows of bad data on a computer that doesn't have MATLAB installed. So really, part one of my question is: Should I save the data in a different format?
Second part of the question:
I've tried importdata, csvread, and dlmread, but I'm not sure which is best, or if there's a better solution. Right now I'm using my own script using a loop and fgetl, which is horribly slow for large files. Any suggestions?
function [data,headers]=csvreader(filename); %V1_1
fid=fopen(filename,'r');
data={};
headers={};
count=1;
while 1
textline=fgetl(fid);
if ~ischar(textline), break, end
nextchar=textline(1);
idx=1;
while nextchar~=','
headers{count}(idx)=textline(1);
idx=idx+1;
textline(1)=[];
nextchar=textline(1);
end
textline(1)=[];
data{count}=str2num(textline);
count=count+1;
end
fclose(fid);
(I know this is probably terribly written code - I'm an engineer, not a programmer, please don't yell at me - any suggestions for improvement would be welcome, though.)
It would probably make the data easier to read if you could pad the file with NaN values when your first script creates it:
Item1,1,2,3,NaN
Item2,4,5,6,7
Item3,8,9,NaN,NaN
or you could even just print empty fields:
Item1,1,2,3,
Item2,4,5,6,7
Item3,8,9,,
Of course, in order to pad properly you would need to know what the maximum number of values across all the items is before hand. With either format above, you could then use one of the standard file reading functions, like TEXTSCAN for example:
>> fid = fopen('uneven_data.txt','rt');
>> C = textscan(fid,'%s %f %f %f %f','Delimiter',',','CollectOutput',1);
>> fclose(fid);
>> C{1}
ans =
'Item1'
'Item2'
'Item3'
>> C{2}
ans =
1 2 3 NaN %# TEXTSCAN sets empty fields to NaN anyway
4 5 6 7
8 9 NaN NaN
Instead of parsing the string textline one character at a time. You could use strtok to break the string up for example
stringParts = {};
tline = fgetl(fid);
if ~ischar(tline), break, end
i=1;
while 1
[stringParts{i},r]=strtok(tline,',');
tline=r;
i=i+1;
if isempty(r), break; end
end
% store the header
headers{count} = stringParts{1};
% convert the data into numbers
for j=2:length(stringParts)
data{count}(j-1) = str2double(stringParts{j});
end
count=count+1;
I've had the same problem with reading csv data in Matlab, and I was surprised by how little support there is for this, but then I just found the import data tool. I'm in r2015b.
On the top bar in the "Home" tab, click on "Import Data" and choose the file you'd like to read. An app window will come up like this:
Import Data tool screenshot
Under "Import Selection" you have the option to "generate function", which gives you quite a bit of customization options, including how to fill empty cells, and what you'd like the output data structure to be. Plus it's written by MathWorks, so it's probably utilizing the fastest available method to read csv files. It was almost instantaneous on my file.
Q1) If you know the max number of columns you can fill empty entries with NaN
Also, if all values are numerical, do you really need "Item#" column? If yes, you can use only "#", so all data is numerical.
Q2) The fastest way to read num. data from a file without mex-files is csvread.
I try to avoid using strings in csv files, but if I have to, I use my csv2cell function:
http://www.mathworks.com/matlabcentral/fileexchange/20135-csv2cell

How do you create a matrix from a text file in MATLAB?

I have a text file which has 4 columns, each column having 65536 data points. Every element in the row is separated by a comma. For example:
X,Y,Z,AU
4010.0,3210.0,-440.0,0.0
4010.0,3210.0,-420.0,0.0
etc.
So, I have 65536 rows, each row having 4 data values as shown above. I want to convert it into a matrix. I tried importing data from the text file to an excel file, because that way its easy to create a matrix, but I lost more than half the data.
If all the entries in your file are numeric, you can simply use a = load('file.txt'). It should create a 65536x4 matrix a. It is even easier than csvread
Have you ever tried using 'importdata'?
The parameters you need only file name and delimiter.
>> tmp_data = importdata('your_file.txt',',')
tmp_data =
data: [2x4 double]
textdata: {'X' 'Y' 'Z' 'AU'}
colheaders: {'X' 'Y' 'Z' 'AU'}
>> tmp_data.data
ans =
4010 3210 -440 0
4010 3210 -420 0
>> tmp_data.textdata
ans =
'X' 'Y' 'Z' 'AU'
Instead of messing with Excel, you should be able to read the text file directly into MATLAB (using the functions FOPEN, FGETL, FSCANF, and FCLOSE):
fid = fopen('file.dat','rt'); %# Open the data file
headerChars = fgetl(fid); %# Read the first line of characters
data = fscanf(fid,'%f,%f,%f,%f',[4 inf]).'; %'# Read the data into a
%# 65536-by-4 matrix
fclose(fid); %# Close the data file
The easiest way to do it would be to use MATLAB's csvread function.
There is also this tool which reads CSV files.
You could do it yourself without too much difficulty either: Just loop over each line in the file and split it on commas and put it in your array.
Suggest you familiarize yourself with dlmread and textscan.
dlmread is like csvread but because it can handle any delimiter (tab, space, etc), I tend to use it rather than csvread.
textscan is the real workhorse: lots of options, + it works on open files and is a little more robust to handling "bad" input (e.g. non-numeric data in the file). It can be used like fscanf in gnovice's suggestion, but I think it is faster (don't quote me on that though).