Confused with .tsv files in MATLAB (converting to a Matrix?) - matlab

I have a .tsv file that I wish to open in MATLAB, however I am having several problems with this.
I have tried the following
fid = fopen('data.tsv');
C = textscan(fid, ['%s' repmat('%f',1,8)], 'HeaderLines', 1);
fclose(fid);
and got some weird values that had nothing to do with my file. I also tried:
data = dlmread('data.tsv', '\t');
and got this
Error using dlmread (line 139)
Mismatch between file and format string.
Trouble reading number from file (row 1u, field 1u) ==> Participant Assessment
Experiment Block Trial
Answer Reaction Timestamp Free Response\n
Is there some way I can get it to ignore the header, or am I doing it totally wrong?

With dlmread you can specify where to start reading in the file. This is one of the few times that MATLAB indexing begins at 0 - [0,0] is the first row, first column. Therefore, to ignore the first row (containing your header):
data = dlmread('data.tsv','\t', 1, 0);
This will only work if all the values (other than the header lines you skip) are numeric.
Your example with textscan also looks fine to me (provided that the format supplied is correct and there is indeed only one header line). C will be a cell array; to obtain the data from each column use C{n} where n is the column number.
Rather than skipping the header line, it's sometimes useful to just read it in to a separate value:
fid = fopen('data.tsv');
C_header = textscan(fid, '%s',9);
C = textscan(fid, ['%s' repmat('%f',1,8)]);
fclose(fid);

Related

Read hex from file matlab

What I have
A txt file like:
D091B
E7E1F
20823
...
What I need
To read them and store them like char, just as they are in the file: N (don't knot how many) lines, with its 5 characters (5 columns) at each one.
What have I tried
fichero = fopen('PS.txt','r');
sizeDatos = [[] 5]; % Several Options, read below
resultados=fscanf(fichero, '%s', sizeDatos); % Here too
fclose(fichero);
I've tried with the snippet above, to read my txt file. However, I didn't manage to get it. Most I've obtained is, using:
sizeDatos = [1 Inf];
So I got all my hex characters into an array, with no spaces.
As you can see, I've tried several optios changing fscanf size parameter, as well as trying to say into the format chain that it should recognize new lines by using \n for example. None of them have worked for me.
Any idea about how can I get it? I've readed fscanf page from documentation, but it didn't inspire me to make anything different.
One possible solution is using textscan and convert it to a cell array.
fileId = fopen('PS.txt');
C = textscan(fileId, '%s');
Now to show the content of cell you can use
celldisp(C)
Or you can convert it to other types.
Don't forget to close your file after using it.

Reading one column from a text file

What is the equivalent of fgel and fgets in MATLAB for reading one column at a time (not a line) from a text file?
You cannot avoid reading the file. However, if your dataset is large, you can tell MATLAB to ignore the irrelevant parts while reading the file.
For instance, if your columns are space delimited, and you want to read the floating-point numbers in the first column, you can try the following:
fid = fopen('input.txt');
C = textscan(fid, '%f %*[^\n]');
C = C{:};
fclose(fid);
This still reads the entire file, but stores only the first column in memory.

Textscan generates a vectore twice the expected size

I want to load a csv file in a matrix using matlab.
I used the following code:
formatSpec = ['%*f', repmat('%f',1,20)];
fid = fopen(filename);
X = textscan(fid, formatSpec, 'Delimiter', ',', 'CollectOutput', 1);
fclose(fid);
X = X{1};
The csv file has 1000 rows and 21 columns.
However, the matrix X generated has 2000 columns and 20 columns.
I tried using different delimiters like '\t' or '\n', but it doesn't change.
When I displayed X, I noticed that it displayed the correct csv file but with extra rows of zeros every 2 rows.
I also tried adding the 'HeaderLines' parameters:
`X = textscan(fid, formatSpec1, 'Delimiter', '\n', 'CollectOutput', 1, 'HeaderLines', 1);`
but this time, the result is an empty matrix.
Am I missing something?
EDIT: #horchler
I could read with no problem the 'test.csv' file.
There is no extra comma at the end of each row. I generated my csv file with a python script: I read the rows of another csv file, modified these (selecting some of them and doing arithmetic operations on them) and wrote the new rows on another csv file. In order to do this, I converted each element of the first csv file into floats...
New Edit:
Reading the textscan documentation more carefully, I think the problem is that my input file is neither a textfile nor a str, but a file containing floats
EDIT: three lines from the file
0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0,2
1,-0.3834323,-1.92452324171,-1.2453254094,0.43455627857,-0.24571121,0.4340657,1,1,0,0,0,0.3517396202,1,0,0,0.3558122164,0.2936975319,0.4105696144,0,1,0
-0.78676,-1.09767,0.765554578,0.76579043,0.76,1,0,0,323124.235998,1,0,0,0,1,0,0,1,0,0,0,2
How about using regex ?
X=[];
fid = fopen(filename);
while 1
fl = fgetl(fid);
if ~ischar(fl), break, end
r =regexp(fl,'([-]*\d+[.]*\d*)','match');
r=r(1:21); % because your line 2nd is somehow having 22 elements,
% all lines must have same # elements or an error will be thrown
% Error: CAT arguments dimensions are not consistent.
X=[X;r];
end
fclose(fid);
Using csvread to read a csv file seems a good option. However, I also tend to read csv files with textscan as files are sometimes badly written. Having more options to read them is therefore necessary.
I face a reading problem like yours when I think the file is written a certain way but it is actually written another way. To debug it I use fgetl and print, for each line read, both the output of fgetl and its double version (see the example below). Examining the double version, you may find which character causes a problem.
In your case, I would first look at multiple occurrences of delimiters (',' and '\t') and , in 'textscan', I would activate the option 'MultipleDelimsAsOne' (while turning off 'CollectOutput').
fid = fopen(filename);
tline = fgetl(fid);
while ischar(tline)
disp(tline);
double(tline)
pause;
tline = fgetl(fid);
end
fclose(fid);

Problem concatenating a matrix of numbers with a vector of strings (column labels) using cell2mat

I'm a Mac user (10.6.8) using MATLAB to process calculation results. I output large tables of numbers to .csv files. I then use the .csv files in EXCEL. This all works fine.
The problem is that each column of numbers needs a label (a string header). I can't figure out how to concatenate labels to the table of numbers. I would very much appreciate any advice. Here is some further information that might be useful:
My labels are contained within a cell array:
columnsHeader = cell(1,15)
that I fill in with calculation results; for example:
columnsHeader{1} = propertyStringOne (where propertyStringOne = 'Liq')
The sequence of labels is different for each calculation. My first attempt was to try and concatenate the labels directly:
labelledNumbersTable=cat(1,columnsHeader,numbersTable)
I received an error that concatenated types need to be the same. So I tried converting the labels/strings using cell2mat:
columnsHeader = cell2mat(columnsHeader);
labelledNumbersTable = cat(1,columnsHeader,numbersTable)
But that took ALL the separate labels and made them into one long word... Which leads to:
??? Error using ==> cat
CAT arguments dimensions are not consistent.
Does anyone know of an alternative method that would allow me to keep my original cell array of labels?
You will have to handle writing the column headers and the numeric data to the file in two different ways. Outputting your cell array of strings will have to be done using the FPRINTF function, as described in this documentation for exporting cell arrays to text files. You can then output your numeric data by appending it to the file (which already contains the column headers) using the function DLMWRITE. Here's an example:
fid = fopen('myfile.csv','w'); %# Open the file
fprintf(fid,'%s,',columnsHeader{1:end-1}); %# Write all but the last label
fprintf(fid,'%s\n',columnsHeader{end}); %# Write the last label and a newline
fclose(fid); %# Close the file
dlmwrite('myfile.csv',numbersTable,'-append'); %# Append your numeric data
The solution to the problem is already shown by others. I am sharing a slightly different solution that improves performance especially when trying to export large datasets as CSV files.
Instead of using DLMWRITE to write the numeric data (which internally uses a for-loop over each row of the matrix), you can directly call FPRINTF to write the whole thing at once. You can see a significant improvement if the data has many rows.
Example to illustrate the difference:
%# some random data with column headers
M = rand(100000,5); %# 100K rows, 5 cols
H = strtrim(cellstr( num2str((1:size(M,2))','Col%d') )); %'# headers
%# FPRINTF
tic
fid = fopen('a.csv','w');
fprintf(fid,'%s,',H{1:end-1});
fprintf(fid,'%s\n',H{end});
fprintf(fid, [repmat('%.5g,',1,size(M,2)-1) '%.5g\n'], M'); %'# default prec=5
fclose(fid);
toc
%# DLMWRITE
tic
fid = fopen('b.csv','w');
fprintf(fid,'%s,',H{1:end-1});
fprintf(fid,'%s\n',H{end});
fclose(fid);
dlmwrite('b.csv', M, '-append');
toc
The timings on my machine were as follows:
Elapsed time is 0.786070 seconds. %# FPRINTF
Elapsed time is 6.285136 seconds. %# DLMWRITE

How do you create a matrix from a text file in MATLAB?

I have a text file which has 4 columns, each column having 65536 data points. Every element in the row is separated by a comma. For example:
X,Y,Z,AU
4010.0,3210.0,-440.0,0.0
4010.0,3210.0,-420.0,0.0
etc.
So, I have 65536 rows, each row having 4 data values as shown above. I want to convert it into a matrix. I tried importing data from the text file to an excel file, because that way its easy to create a matrix, but I lost more than half the data.
If all the entries in your file are numeric, you can simply use a = load('file.txt'). It should create a 65536x4 matrix a. It is even easier than csvread
Have you ever tried using 'importdata'?
The parameters you need only file name and delimiter.
>> tmp_data = importdata('your_file.txt',',')
tmp_data =
data: [2x4 double]
textdata: {'X' 'Y' 'Z' 'AU'}
colheaders: {'X' 'Y' 'Z' 'AU'}
>> tmp_data.data
ans =
4010 3210 -440 0
4010 3210 -420 0
>> tmp_data.textdata
ans =
'X' 'Y' 'Z' 'AU'
Instead of messing with Excel, you should be able to read the text file directly into MATLAB (using the functions FOPEN, FGETL, FSCANF, and FCLOSE):
fid = fopen('file.dat','rt'); %# Open the data file
headerChars = fgetl(fid); %# Read the first line of characters
data = fscanf(fid,'%f,%f,%f,%f',[4 inf]).'; %'# Read the data into a
%# 65536-by-4 matrix
fclose(fid); %# Close the data file
The easiest way to do it would be to use MATLAB's csvread function.
There is also this tool which reads CSV files.
You could do it yourself without too much difficulty either: Just loop over each line in the file and split it on commas and put it in your array.
Suggest you familiarize yourself with dlmread and textscan.
dlmread is like csvread but because it can handle any delimiter (tab, space, etc), I tend to use it rather than csvread.
textscan is the real workhorse: lots of options, + it works on open files and is a little more robust to handling "bad" input (e.g. non-numeric data in the file). It can be used like fscanf in gnovice's suggestion, but I think it is faster (don't quote me on that though).