How to import data with row and column headers - matlab

I want to import data from a text file with row and column headers, and it in a matrix. For instance, the input file looks as follows:
data c1 c2 c3 c4
r1 1 2 3 4
r2 5 6 7 8
Also, is it possible to access the row and column names with the corresponding data element? And is it possible to modify that based on the result of operations?
Thanks in advance.

I would use textscan with an extra %*s in the format string to gobble up the first header column in each row. The first header row should be used to count the number of columns, in case it is unknown:
fid = fopen('input.txt'); %// Open the input file
%// Read the first header row and calculate the number of columns in the file
C = textscan(fid, '%s', 1, 'Delimiter', '\n', 'MultipleDelimsAsOne', true);
cols = numel(regexp(C{1}{1}, '\s*\w+'));
%// Read the rest of the rows and store the data values in a matrix
C = textscan(fid, ['%*s', repmat('%f', 1, cols - 1)]);
A = [C{:}]; %// Store the data in a matrix
fclose(fid); %// Close the input file
The data is stored in matrix A.

From the documentation on readtable see http://www.mathworks.com/help/matlab/ref/readtable.html
T = readtable(filename, 'ReadVariableNames', true) if the first column has the headers
or
T = readtable(filename, 'ReadRowNames', true) if the first row has the headers
You may also be interested into the 'HeaderLines' name-value pair if you'd like to drop more than just the first line.

You could use importdata, for example, supposing the delimiter is "tab",
rawdata = importdata(filename, '\t');
row_names = rawdata.textdata(2:end,1);
col_names = rawdata.textdata(1, 2:end);
data_mat = rawdata.data;
The row_names and col_names are cell array types. If you like them to be one string delimited by \t or ,, etc., you could use strjoin on them.

Related

How do I read comma separated doubles from text file into MATLAB?

I have a text file called Output.txt that looks like this:
0.000000,0.550147,0.884956
1.000000,0.532486,0.847458
2.000000,0.501333,0.800000
3.000000,0.466418,0.746269
4.000000,0.409492,0.662252
5.000000,0.327257,0.520833
6.000000,0.267376,0.425532
7.000000,0.188427,0.296736
8.000000,0.115824,0.180505
9.000000,0.062768,0.099108
I need to read in the three values separated by commas into MATLAB as 3 different vectors. They can be called anything but C1, C2, and C3 could work.
C1 would contain [0.000000,1.000000,2.000000, ...], C2 would contain [0.550147,0.532486,...] and C3 would contain the values in the third column [0.884956,0.847458,...].
I tried using the following but I'm having problems getting it to work correctly:
File = 'Output.txt';
f = fopen(File, 'r');
C = textscan(f, '%f%f%f', 'Delimiter', ',');
fclose(f);
This gives me a 1x3 Cell array C but each of the cells in C are 1x100 and do not contain the correct numbers.
You have a Comma Separated Value file, so you can simply use csvread to read in your matrix:
C = csvread('Output.txt');
where C now is a matrix containing all your values, which you can of course index through columns and rows. I'd recommend against creating the column vectors rather use C(:,1) for the first column etc.

Exporting blank values into a .txt file - MATLAB

I'm currently trying to export multiple matrices of unequal lengths into a delimited .txt file thus I have been padding the shorter matrices with 0's such that dlmwrite can use horzcat without error:
dlmwrite(filename{1},[a,b],'delimiter','\t')
However ideally I do not want the zeroes to appear in the .txt file itself - but rather the entries are left blank.
Currently the .txt file looks like this:
55875 3.1043e+05
56807 3.3361e+05
57760 3.8235e+05
58823 4.2869e+05
59913 4.3349e+05
60887 0
61825 0
62785 0
63942 0
65159 0
66304 0
67509 0
68683 0
69736 0
70782 0
But I want it to look like this:
55875 3.1043e+05
56807 3.3361e+05
57760 3.8235e+05
58823 4.2869e+05
59913 4.3349e+05
60887
61825
62785
63942
65159
66304
67509
68683
69736
70782
Is there anyway I can do this? Is there an alternative to dlmwrite which will mean I do not need to have matrices of equal lengths?
If a is always longer than b you could split vector a into two vectors of same length as vector b and the rest:
a = [1 2 3 4 5 6 7 8]';
b = [9 8 7 ]';
len = numel(b);
dlmwrite( 'foobar.txt', [a(1:len), b ], 'delimiter', '\t' );
dlmwrite( 'foobar.txt', a(len+1:end), 'delimiter', '\t', '-append');
You can read in the numeric data and convert to string and then add proper whitespaces to have the final output as string based cell array, which you can easily write into the output text file.
Stage 1: Get the cell of strings corresponding to the numeric data from column vector inputs a, b, c and so on -
%// Concatenate all arrays into a cell array with numeric data
A = [{a} {b} {c}] %// Edit this to add more columns
%// Create a "regular" 2D shaped cell array to store the cells from A
lens = cellfun('length',A)
max_lens = max(lens)
A_reg = cell(max_lens,numel(lens))
A_reg(:) = {''}
A_reg(bsxfun(#le,[1:max_lens]',lens)) = cellstr(num2str(vertcat(A{:}))) %//'
%// Create a char array that has string data from input arrays as strings
wsp = repmat({' '},max_lens,1) %// Create whitespace cell array
out_char = [];
for iter = 1:numel(A)
out_char = [out_char char(A_reg(:,iter)) char(wsp)]
end
out_cell = cellstr(out_char)
Stage 2: Now, that you have out_cell as the cell array that has the strings to be written to the text file, you have two options next for the writing operation itself.
Option 1 -
dlmwrite('results.txt',out_cell(:),'delimiter','')
Option 2 -
outfile = 'results.txt';
fid = fopen(outfile,'w');
for row = 1:numel(out_cell)
fprintf(fid,'%s\n',out_cell{row});
end
fclose(fid);

How to save data transposed in a tab-delimited file

Assuming you have an array of 5 lines by n columns as a MATLAB variable.
How do you save to a file each column of the array into a new array as as follows:
column1 becomes line1 and so on.
I need this to be without comas between elements so it should be something along the lines of
dlmwrite('pointcloud.pts', cloud, 'delimiter', '\t');
produces
but I want column one to be saved as line one.
I think you only have to transpose your matrix. Here's an example:
n = 7;
test = rand(5, n);
dlmwrite('pointcloud.pts', test', 'delimiter', '\t');
For me it works fine. -> ' <- is the operator to transpose... Or did I understand you wrong?
EDIT: Look, I think that you are still saving the not transposed matrix. So in your case you are still saving the first 443250 elements of the first row into the first row of your file. By transposing your data with the apostroph ' you transpose the data and can store it correctly. Have a look at my code: you will see one apostrophe (as operator to transpose) after >test<.
You can see that for example if you type:
a = rand(2, 4);
a_transposed = a';

Matlab - how to read a file with a large number of integer columns using textscan

I have a file which contains data in the following format:
filename.jpg,132,234,234,345,4555,23333,344,...,333
I have put ... to mark the fact that I have a long sequence of integers. On each line I have a total of 132 integers.
I want to read the numbers in a matrix with 132 columns and as many rows as I have in the input file. How can I read this data with textscan function? How should I specify this type of format? I also want to read the first column of filenames into a cell array.
For the cell array I have used the following syntax:
fid = fopen(inputPath);
buffer = textscan(fid, '%s%*[^\n]', 'Delimiter', ',');
close(fid);
You can follow your first call to textscan with a csvread instead:
A = csvread('data.txt', 0, 1);
The two last parameters specify row and column at which your data starts. Your cell will contain the strings from the first column, A contains a matrix with the data.
Otherwise, if you really have to use textscan, create your format string aside:
fid = fopen('data.txt', 'r');
% crate a string with as many %f as you need
fmt = ['%s' repmat('%f', 1, 132)];
buffer = textscan(fid, fmt, 'Delimiter', ',');
names = buffer{1};
A = [buffer{2:end}];
fclose(fid);

How can I parse this semicolon deliminted file?

I have a semicolon separated file in the following format:
Press;Temp.;CondF;Cond20;O2%;O2ppm;pH;NO3;Chl(a);PhycoEr;PhycoCy;PAR;DATE;TIME;excel.date;date.time
0.96;20.011;432.1;431.9;125.1;11.34;8.999;134;9.2;2.53;1.85;16.302;08.06.2011;12:01:52;40702;40702.0.5
1;20.011;433;432.8;125;11.34;9;133.7;8.19;3.32;2.02;17.06;08.06.2011;12:01:54;40702;40702.0.5
1.1;20.012;432.7;432.4;125.1;11.34;9;133.8;8.35;2.13;2.2;19.007;08.06.2011;12:01:55;40702;40702.0.5
1.2;20.012;432.8;432.5;125.2;11.35;9.001;133.8;8.45;2.95;1.95;21.054;08.06.2011;12:01:56;40702;40702.0.5
1.3;20.012;432.7;432.4;125.4;11.37;9.002;133.7;8.62;3.17;1.87;22.934;08.06.2011;12:01:57;40702;40702.0.5
1.4;20.007;432.1;431.9;125.2;11.35;9.003;133.7;9.48;4.17;1.6;24.828;08.06.2011;12:01:58;40702;40702.0.5
How can I parse this into a matrix in matlab? I don't care about the first row, but I would like the rest of the rows in the matrix. They don't need to be converted into doubles, the matrix can be comprised of strings. There are new lines in the file, which represent the end of a row. There are no semicolons before the new lines.
Thanks for the help.
Consider this code to read the data:
fid = fopen('file.txt','rt');
frmt = [repmat('%f ',1,12) '%s %s %f %s'];
C = textscan(fid, frmt, 'Delimiter',';', 'CollectOutput',true, 'HeaderLines',1);
fclose(fid);
First we read into the variable C the different components: the first twelve columns as numbers, next two as strings (we will convert them to serial date numbers in the next step), another numeric column, and finally a string one:
>> C
C =
[6x12 double] {6x2 cell} [6x1 double] {6x1 cell}
As I mentioned, we can parse and covert C{2} into a serial date:
dt = datenum(strcat(C{2}(:,1),{' '},C{2}(:,2)), 'dd.mm.yyyy HH:MM:ss');
Now we can merge all of them into a cell array as a table. We use a cell array instead of a numeric matrix because the last column is still strings.
>> data = [num2cell([C{1} dt C{3}]) C{4}]
data =
Columns 1 through 7
[0.96] [20.011] [432.1] [431.9] [125.1] [11.34] [8.999]
[ 1] [20.011] [ 433] [432.8] [ 125] [11.34] [ 9]
[ 1.1] [20.012] [432.7] [432.4] [125.1] [11.34] [ 9]
[ 1.2] [20.012] [432.8] [432.5] [125.2] [11.35] [9.001]
[ 1.3] [20.012] [432.7] [432.4] [125.4] [11.37] [9.002]
[ 1.4] [20.007] [432.1] [431.9] [125.2] [11.35] [9.003]
Columns 8 through 14
[ 134] [ 9.2] [2.53] [1.85] [16.302] [7.3466e+05] [40702]
[133.7] [8.19] [3.32] [2.02] [ 17.06] [7.3466e+05] [40702]
[133.8] [8.35] [2.13] [ 2.2] [19.007] [7.3466e+05] [40702]
[133.8] [8.45] [2.95] [1.95] [21.054] [7.3466e+05] [40702]
[133.7] [8.62] [3.17] [1.87] [22.934] [7.3466e+05] [40702]
[133.7] [9.48] [4.17] [ 1.6] [24.828] [7.3466e+05] [40702]
Column 15
'40702.0.5'
'40702.0.5'
'40702.0.5'
'40702.0.5'
'40702.0.5'
'40702.0.5'
You can use textscan for this.
fid = fopen('data.txt'); %open file
headers = fgetl(fid); %get first line
headers = textscan(headers,'%s','delimiter',';'); %read first line
format = repmat('%s',1,size(headers{1,1},1)); %count columns and make format string
data = textscan(fid,format,'delimiter',';'); %read rest of the file
data = [data{:}];
Evening Max.
I'm going to assume that you are already able to import the data from a file or otherwise get it into Matlab. The method that I normally use for data like this leaves it in a column matrix of cells. Each cell then contains a line of data from the file.
You can then convert the cell to a matrix of characters and use regexp to parse that data into an easier to use matrix with the top row being your header data.
If you get stuck just post up some code and we can work through it.
Cheers!
Update:
Here's the code I was talking about.
A = importdata('filepath\sample.txt') %This uses the newline on each line to make a new row.
B= [];
for(n = 1:size(A,1))
B = [B;regexp(cell2mat(A(n)),';','split')]; %This uses the ; to split the string
end
Matlab indexing is always done in a (row, column) format. So something like matrix(2,3) will call an item at the position of row 2, column 3. Matlab also always indexs from 1, not 0 like in many other languages out there.
If you have a single row or a single column (commonly referred to as a vector) then you simply call matrix(4) and that returns the 4th element. You can also have 3 or more layers to a matrix if you so desire. Think a matrix of matrices if you want.
Cells are extremely useful when it comes to storing variable length data in a single location. When data is stored in a cell it is still called in the same manner as you would a matrix but you have to convert it from the cell type to a matrix (cell2mat) for some uses. You'll learn those pretty quick. There's other ways to convert from cell as well, such as cell2num.
Hope that helps some more!