Simplest way to read space delimited text file matlab - matlab

Ok, so I'm struggling with the most mundane of things I have a space delimited text file with a header in the first row and a row per observation and I'd like to open that file in matlab. If I do this in R I have no problem at all, it'll create the most basic matrix and voila!
But MATLAB seems to be annoying with this...
Example of the text file:
"picFile" "subjCode" "gender"
"train_1" 504 "m"
etc.
Can I get something like a matrix at all? I would then like to have MATLAB pull out some data by doing data(1,2) for example.
What would be the simplest way to do this?
It seems like having to write a loop using f-type functions is just a waste of time...

If you have a sufficiently new version of Matlab (R2013b+, I believe), you can use readtable, which is very much like how R does it:
T = readtable('data.txt','Delimiter',' ')
There are many functions for manipulating tables and converting back and forth between them and other data types such as cell arrays.
There are some other options in the data import and export section of the Statistics toolbox that should work in older versions of Matlab:
tblread: output in terms of separate variables for strings and numbers
caseread: output in terms of a char array
tdfread: output in terms of a struct
Alternatively, textscan should be able to accomplish what you need and probably will be the fastest:
fid = fopen('data.txt');
header = textscan(fid,'%s',3); % Optionally save header names
C = textscan(fid,'%s%d%s','HeaderLines',1); % Read data skipping header
fclose(fid); % Don't forget to close file
C{:}

Found a way to solve my problem.
Because I don't have the latest version of MATLAB and cannot use readable which would be the preferred option I ended up doing using textread and specifying the format of each column.
Tedious but maybe the "simplest" way I could find:
[picFile subCode gender]=textread('data.txt', '%s %f %s', 'headerlines',1);
T=[picFile(:) subCode(:) gender(:)]
The textscan solution by #horchler seems pretty similar. Thanks!

Related

Create a simple table in Matlab

I've been fighting with fprintf for an hour now, should be easy but it's not apparently.
Have a vector with descriptive statistics called datasave, contains 9 numbers like average, standard dev, kurtosis etc.
And I have a vector datalabels with the lablels 'Average' , 'St.dev', 'Kurt' etc.
Open a file with fileopen
print the labels
new line
print the values ( exactly under the labels!)
close the file
This is what I've tried so far:
fileID = fopen('descstat2.txt','w');
fprintf(fileID,'MediaTonnes MinTonnes MaxTonnes Sigma Skew Kurt SigmadTonnes SkewdTonnes KurtdTonnes\r\n');
format short;
fprintf(fileID, '%g\t%g\t%g\n', datasave.');
Help?
I have at least 50 different combinations so I can't really give you my output...
Maybe try this:
fileID = fopen('descstat2.txt','w');
fprintf(fileID,'%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\r\n', datalabels);
fprintf(fileID, '%g\t%g\t%g\t%g\t%g\t%g\t%g\t%g\t%g\r\n', datasave);
fclose
To use XLSWRITE you need to create a cell array:
out = [datalabels(:)'; num2cell(datasave)];
xlswrite('descstat2', out)
If the file you are saving to is exist as a text file, xlswrite will save it as a text as well.
It probably will be slower that fprintf (due to COM interface) but you don't have to deal with formatting the output. Just need to convert everything to cells.
Another option is to use TBLWRITE from Statistical Toolbox. You can just do:
tblwrite(datasave, datalabels, [], filename, '\t')
It will also put row numbers or labels specified as 3rd argument. Might be useful for some data.

Reading complicated format CSV fileinto Matlab

My raw CSV file looks like the 1st pic. And I wants to use Matlab read it into the format as the 2rd pic. I have over 1000 the same kind of CSV files, it will be painful if I do it by copy/paste. How can I do this? Any examples?
raw data:
output data:
First thing to realize is that a .csv file has a very simple format. Your above file is actually a plain text file with the following text on each line:
id,A001
height
a1,a2,a3
3,4,5
3,4,5
6,7,5
weight
a1,a2,a3
4,4,5
5,4,6
i6,7,5
So it is not all that hard for you to write your own parser in Matlab. You want to use commands like
fid = fopen('filename.csv','r');
L = fgetl(fid); % get a text line from the file
commas = find(L==','); % find where the commas are in the line
n1 = str2num(L(1:commas(1)-1); % convert the first comma-delimited number on line L
fidout - fopen('myfile.csv','w');
Lout = [ L(commas(2)+1:commas(3)-1) ', a1, a1'];
fwrite(fidout,Lout); % write a line out to the output file
fclose all; % close all open files.
It will seem slow at first reading the various values in to various variables, and then arranging them to write out the way you want them written out to your output file. But once you get rolling it will go pretty fast and you will find yourself with a pretty good understanding of what is in files, and you will know first hand what is involved in writing something like texscan.m or csvwrite.m and so on.
Good luck!

How to read text fields into MATLAB and create a single matrix

I have a huge CSV file that has a mix of numerical and text datatypes. I want to read this into a single matrix in Matlab. I'll use a simpler example here to illustrate my problem. Let's say I have this CSV file:
1,foo
2,bar
I am trying to read this into MatLab using:
A=fopen('filename.csv');
B=textscan(A,'%d %d', 'delimiter',',');
C=cell2mat(B);
The first two lines work fine, but the problem is that texscan doesn't create a 2x2 matrix; instead it creates a 1x2 matrix with each value being an array. So I try to use the last line to combine the arrays into one big matrix, but it generates an error because the arrays have different datatypes.
Is there a way to get around this problem? Or a better way to combine the arrays?
I am note sure if combining them is a good idea. It is likely that you would be better off with them separate.
I changed your code, so that it works better:
clear
clc
A=fopen('filename.csv');
B=textscan(A,'%d %s', 'delimiter',',')
fclose(A)
Looking at the results
K>> B{1}
ans =
1
2
K>> B{2}
ans =
'foo'
'bar'
Really, I think this is the format that is most useful. If anything, most people would want to break this cell array into smaller chunks
num = B{1}
txt = B{2}
Why are your trying to combine them? They are already together in a cell array, and that is the most combined you are going to get.
There is a natural solution to this, but it requires the Statistics toolbox (version 6.0 or higher). Mixed data types can be read into a dataset array. See the Mathworks help page here.
I believe you can't use textscan for this purpose. I'd use fscanf which always gives you a matrix as specified. If you don't know the layout of the data it gets kind of tricky however.
fscanf works as follows:
fscanf(fid, format, size)
where fid is the fid generated by the fopen
format is the file format & how you are reading the data (['%d' ',' '%s'] would work for your example file)
size is the matrix dimensions ([2 2] would work on your example file).

Fastest way to import CSV files in MATLAB

I've written a script that saves its output to a CSV file for later reference, but the second script for importing the data takes an ungainly amount of time to read it back in.
The data is in the following format:
Item1,val1,val2,val3
Item2,val4,val5,val6,val7
Item3,val8,val9
where the headers are on the left-most column, and the data values take up the remainder of the row. One major difficulty is that the arrays of data values can be different lengths for each test item. I'd save it as a structure, but I need to be able to edit it outside the MATLAB environment, since sometimes I have to delete rows of bad data on a computer that doesn't have MATLAB installed. So really, part one of my question is: Should I save the data in a different format?
Second part of the question:
I've tried importdata, csvread, and dlmread, but I'm not sure which is best, or if there's a better solution. Right now I'm using my own script using a loop and fgetl, which is horribly slow for large files. Any suggestions?
function [data,headers]=csvreader(filename); %V1_1
fid=fopen(filename,'r');
data={};
headers={};
count=1;
while 1
textline=fgetl(fid);
if ~ischar(textline), break, end
nextchar=textline(1);
idx=1;
while nextchar~=','
headers{count}(idx)=textline(1);
idx=idx+1;
textline(1)=[];
nextchar=textline(1);
end
textline(1)=[];
data{count}=str2num(textline);
count=count+1;
end
fclose(fid);
(I know this is probably terribly written code - I'm an engineer, not a programmer, please don't yell at me - any suggestions for improvement would be welcome, though.)
It would probably make the data easier to read if you could pad the file with NaN values when your first script creates it:
Item1,1,2,3,NaN
Item2,4,5,6,7
Item3,8,9,NaN,NaN
or you could even just print empty fields:
Item1,1,2,3,
Item2,4,5,6,7
Item3,8,9,,
Of course, in order to pad properly you would need to know what the maximum number of values across all the items is before hand. With either format above, you could then use one of the standard file reading functions, like TEXTSCAN for example:
>> fid = fopen('uneven_data.txt','rt');
>> C = textscan(fid,'%s %f %f %f %f','Delimiter',',','CollectOutput',1);
>> fclose(fid);
>> C{1}
ans =
'Item1'
'Item2'
'Item3'
>> C{2}
ans =
1 2 3 NaN %# TEXTSCAN sets empty fields to NaN anyway
4 5 6 7
8 9 NaN NaN
Instead of parsing the string textline one character at a time. You could use strtok to break the string up for example
stringParts = {};
tline = fgetl(fid);
if ~ischar(tline), break, end
i=1;
while 1
[stringParts{i},r]=strtok(tline,',');
tline=r;
i=i+1;
if isempty(r), break; end
end
% store the header
headers{count} = stringParts{1};
% convert the data into numbers
for j=2:length(stringParts)
data{count}(j-1) = str2double(stringParts{j});
end
count=count+1;
I've had the same problem with reading csv data in Matlab, and I was surprised by how little support there is for this, but then I just found the import data tool. I'm in r2015b.
On the top bar in the "Home" tab, click on "Import Data" and choose the file you'd like to read. An app window will come up like this:
Import Data tool screenshot
Under "Import Selection" you have the option to "generate function", which gives you quite a bit of customization options, including how to fill empty cells, and what you'd like the output data structure to be. Plus it's written by MathWorks, so it's probably utilizing the fastest available method to read csv files. It was almost instantaneous on my file.
Q1) If you know the max number of columns you can fill empty entries with NaN
Also, if all values are numerical, do you really need "Item#" column? If yes, you can use only "#", so all data is numerical.
Q2) The fastest way to read num. data from a file without mex-files is csvread.
I try to avoid using strings in csv files, but if I have to, I use my csv2cell function:
http://www.mathworks.com/matlabcentral/fileexchange/20135-csv2cell

How do you save data to a text file in a given format?

I want to save a matrix to a text file, so I can read it by another program. Right now I use:
save('output.txt', 'A','-ascii');
But this saves my file as
6.7206983e+000 2.5896414e-001
6.5710723e+000 4.9800797e-00
6.3466334e+000 6.9721116e-001
5.9975062e+000 1.3346614e+000
6.0224439e+000 1.8127490e+000
6.3466334e+000 2.0517928e+000
6.3965087e+000 1.9721116e+000
But I would like to have them saved without the "e-notation" and not with all the digits. Is there an easy way to do this?
Edit: Thank you! That works just fine. Sorry, but I think I messed up your edit by using the rollback.
I would use the fprintf function, which will allow you to define for yourself what format to output the data in. For example:
fid = fopen('output.txt', 'wt');
fprintf(fid,'%0.6f %0.6f\n', A.');
fclose(fid);
This will output the matrix A with 6 digits of precision after the decimal point. Note that you also have to use the functions fopen and fclose.
Ditto gnovice's solution, if you need performance & custom formatting.
dlmwrite gives you some control (global, not per-field basis) of formatting. But it suffers from lower performance. I ran a test a few years ago and dlmwrite was something like 5-10x slower than the fopen/fprintf/fclose solution. (edit: I'm referring to large matrices, like a 15x10000 matrix)