I have a semicolon separated file in the following format:
Press;Temp.;CondF;Cond20;O2%;O2ppm;pH;NO3;Chl(a);PhycoEr;PhycoCy;PAR;DATE;TIME;excel.date;date.time
0.96;20.011;432.1;431.9;125.1;11.34;8.999;134;9.2;2.53;1.85;16.302;08.06.2011;12:01:52;40702;40702.0.5
1;20.011;433;432.8;125;11.34;9;133.7;8.19;3.32;2.02;17.06;08.06.2011;12:01:54;40702;40702.0.5
1.1;20.012;432.7;432.4;125.1;11.34;9;133.8;8.35;2.13;2.2;19.007;08.06.2011;12:01:55;40702;40702.0.5
1.2;20.012;432.8;432.5;125.2;11.35;9.001;133.8;8.45;2.95;1.95;21.054;08.06.2011;12:01:56;40702;40702.0.5
1.3;20.012;432.7;432.4;125.4;11.37;9.002;133.7;8.62;3.17;1.87;22.934;08.06.2011;12:01:57;40702;40702.0.5
1.4;20.007;432.1;431.9;125.2;11.35;9.003;133.7;9.48;4.17;1.6;24.828;08.06.2011;12:01:58;40702;40702.0.5
How can I parse this into a matrix in matlab? I don't care about the first row, but I would like the rest of the rows in the matrix. They don't need to be converted into doubles, the matrix can be comprised of strings. There are new lines in the file, which represent the end of a row. There are no semicolons before the new lines.
Thanks for the help.
Consider this code to read the data:
fid = fopen('file.txt','rt');
frmt = [repmat('%f ',1,12) '%s %s %f %s'];
C = textscan(fid, frmt, 'Delimiter',';', 'CollectOutput',true, 'HeaderLines',1);
fclose(fid);
First we read into the variable C the different components: the first twelve columns as numbers, next two as strings (we will convert them to serial date numbers in the next step), another numeric column, and finally a string one:
>> C
C =
[6x12 double] {6x2 cell} [6x1 double] {6x1 cell}
As I mentioned, we can parse and covert C{2} into a serial date:
dt = datenum(strcat(C{2}(:,1),{' '},C{2}(:,2)), 'dd.mm.yyyy HH:MM:ss');
Now we can merge all of them into a cell array as a table. We use a cell array instead of a numeric matrix because the last column is still strings.
>> data = [num2cell([C{1} dt C{3}]) C{4}]
data =
Columns 1 through 7
[0.96] [20.011] [432.1] [431.9] [125.1] [11.34] [8.999]
[ 1] [20.011] [ 433] [432.8] [ 125] [11.34] [ 9]
[ 1.1] [20.012] [432.7] [432.4] [125.1] [11.34] [ 9]
[ 1.2] [20.012] [432.8] [432.5] [125.2] [11.35] [9.001]
[ 1.3] [20.012] [432.7] [432.4] [125.4] [11.37] [9.002]
[ 1.4] [20.007] [432.1] [431.9] [125.2] [11.35] [9.003]
Columns 8 through 14
[ 134] [ 9.2] [2.53] [1.85] [16.302] [7.3466e+05] [40702]
[133.7] [8.19] [3.32] [2.02] [ 17.06] [7.3466e+05] [40702]
[133.8] [8.35] [2.13] [ 2.2] [19.007] [7.3466e+05] [40702]
[133.8] [8.45] [2.95] [1.95] [21.054] [7.3466e+05] [40702]
[133.7] [8.62] [3.17] [1.87] [22.934] [7.3466e+05] [40702]
[133.7] [9.48] [4.17] [ 1.6] [24.828] [7.3466e+05] [40702]
Column 15
'40702.0.5'
'40702.0.5'
'40702.0.5'
'40702.0.5'
'40702.0.5'
'40702.0.5'
You can use textscan for this.
fid = fopen('data.txt'); %open file
headers = fgetl(fid); %get first line
headers = textscan(headers,'%s','delimiter',';'); %read first line
format = repmat('%s',1,size(headers{1,1},1)); %count columns and make format string
data = textscan(fid,format,'delimiter',';'); %read rest of the file
data = [data{:}];
Evening Max.
I'm going to assume that you are already able to import the data from a file or otherwise get it into Matlab. The method that I normally use for data like this leaves it in a column matrix of cells. Each cell then contains a line of data from the file.
You can then convert the cell to a matrix of characters and use regexp to parse that data into an easier to use matrix with the top row being your header data.
If you get stuck just post up some code and we can work through it.
Cheers!
Update:
Here's the code I was talking about.
A = importdata('filepath\sample.txt') %This uses the newline on each line to make a new row.
B= [];
for(n = 1:size(A,1))
B = [B;regexp(cell2mat(A(n)),';','split')]; %This uses the ; to split the string
end
Matlab indexing is always done in a (row, column) format. So something like matrix(2,3) will call an item at the position of row 2, column 3. Matlab also always indexs from 1, not 0 like in many other languages out there.
If you have a single row or a single column (commonly referred to as a vector) then you simply call matrix(4) and that returns the 4th element. You can also have 3 or more layers to a matrix if you so desire. Think a matrix of matrices if you want.
Cells are extremely useful when it comes to storing variable length data in a single location. When data is stored in a cell it is still called in the same manner as you would a matrix but you have to convert it from the cell type to a matrix (cell2mat) for some uses. You'll learn those pretty quick. There's other ways to convert from cell as well, such as cell2num.
Hope that helps some more!
Related
I have a cell array [5x1] which all cells are column vectors such as:
exInt =
[46x1 double]
[54x1 double]
[40x1 double]
[51x1 double]
[ 9x1 double]
I need to have a vector (vec) containing the cells in extInt I need to extract and then I have to convert these into a single column array. Such as:
vec = [1,3];
Output = cell2mat(extInt{vec})
Output should become something an array [86x1 double].
The way I have coded I get:
Error using cell2mat
Too many input arguments.
If possible, I would like to have a solution not using a loop.
The best approach here is to use cat along with a comma-separted list created by {} indexing to yield the expected column vector. We specify the first dimension as the first argument since you have all column vectors and we want the output to also be a column vector.
out = cat(1, extInt{vec})
Given your input, cell2mat attempts to concatenate along the second dimension which will fail for your data since all of the data have different number of rows. This is why (in your example) you had to transpose the data prior to calling cell2mat.
Update
Here is a benchmark to compare execution times between the cat and cell2mat approaches.
function benchit()
nRows = linspace(10, 1000, 100);
[times1, times2] = deal(zeros(size(nRows)));
for k = 1:numel(nRows)
rows = nRows(k);
data = arrayfun(#(x)rand(randi([10, 50], 1), 1), 1:rows, 'uni', 0);
vec = 1:2:numel(data);
times1(k) = timeit(#()cat_method(data, vec));
data = arrayfun(#(x)rand(randi([10, 50], 1), 1), 1:rows, 'uni', 0);
vec = 1:2:numel(data);
times2(k) = timeit(#()cell2mat_method(data, vec));
end
figure
hplot(1) = plot(nRows, times1 * 1000, 'DisplayName', 'cat');
hold on
hplot(2) = plot(nRows, times2 * 1000, 'DisplayName', 'cell2mat');
ylabel('Execution Times (ms)')
xlabel('# of Cell Array Elements')
legend(hplot)
end
function out = cat_method(data, vec)
out = cat(1, data{vec});
end
function out = cell2mat_method(data, vec)
out = cell2mat(data(vec)');
end
The reason for the constant offset between the two is that cell2mat calls cat internally but adds some additional logic on top of it. If you just use cat directly, you circumvent that additional overhead.
You have a small error in your code
Change
Output = cell2mat(extInt{vec});
to
Output = cell2mat(extInt(vec));
For cells, both brackets and parentheses can be used to get information. You can read some more about it here, but to summarize:
Use curly braces {} for setting or getting the contents of cell arrays.
Use parentheses () for indexing into a cell array to collect a subset of cells together in another cell array.
In your example, using brackets with index vector vec will produce 2 separate outputs (I've made a shorter version of extInt below)
extInt = {[1],[2 3],[4 5 6]};
extInt{vec}
ans =
1
ans =
4 5 6
As this is 2 separate outputs, it will also be 2 separate input to the function cell2mat. As this function only takes one input you get an error.
One alternative is in your own solution. Take the two outputs and place them inside a new (unnamed) cell
{extInt{vec}}
ans =
[1] [1x3 double]
Now, this (single) result goes into cell2mat without a problem.
(Note though that you might need to transpose your result before depending on if you have column or row vectors in your cell. The size vector (or matrix) to combine need to match/align.)
Another way as to use parentheses (as above in my solution). Here a subset of the original cell is return. Therefore it goes directly into the cell2matfunction.
extInt(vec)
ans =
[1] [1x3 double]
I have been messing around and I got this working by converting this entry into a new cell array and transposing it so the dimensions remained equivalent for the concatenating process
Output = cell2mat({extInt{vec}}')
use
Output = cell2mat(extInt(vec))
Since you want to address the cells in extInt not the content of the cells
extInt(vec)
extInt{vec}
try those to see whats going on
I am currently trying to read data from a text file written exactly like this:
Height = 10
Length = 10
NodeX = 11
NodeY = 11
K = 10
I've written a small code like this
fileID = fopen('input.dat','r');
[a, b] = fscanf(fileID, '%s %f')
And I get the following answer:
a =
72
101
105
103
104
116
b =
1
It seems quite obvious I am not mananging to specify the format specification.
I would like to know how to pick a string along with a float multiple times in the same file.
As the documentation for fscanf states:
If formatSpec contains a combination of numeric and character
specifiers, then fscanf converts each character to its numeric
equivalent. This conversion occurs even when the format explicitly
skips all numeric values (for example, formatSpec is '%*d %s').
MATLAB can be annoyingly bad at reading mixed data types. One possible alternative is to read each line and split up your data using a simple regular expression:
fileID = fopen('results.txt','r');
mydata = {};
ii = 1;
while ~feof(fileID) % While we're not at the end of the file
tline = fgetl(fileID); % Get next line
mydata(ii,:) = regexp(tline, '([a-zA-Z])* = (\d*)', 'tokens');
ii = ii + 1;
end
fclose(fileID);
This returns a 5 x 1 cell array where each cell contains 2 cells (slightly annoying, but you can pull them out) that match your data. In this case, mydata{1}{1} is Height and mydata{1}{2} is 10.
Edit:
And you can flatten your cell array with a reshape call:
mydata = reshape([mydata{:}], 2, [])';
Which turns mydata in this case into a 5x2 cell array.
The fscanf function is a low-level I/O function and is often not the best choice for such rather high-level file input. One alternative would be to use the textscan function, which allows quite advanced format specifications:
fileID = fopen('input.dat','r');
C = textscan(fileID,'%s = %d')
which creates a 1x2 cell array. The first cell C{1} contains another 5x1 cell, where each field contains the name of the field, e.g. 'Height'. The second cell C{2} contains a 5x1 vector containing all integer values from the file.
I want to import some data in an m file. So for I have managed to create a cell array of the data. I want to convert it into a matrix. I used cell2mat but I get an error. I'm new to Matlab, so I would like some help. Here is my complete code
fid = fopen('vessel-movements.csv');
C = textscan(fid, '%f %f %f %f %f %s %s %s', 'HeaderLines', 1, 'Delimiter', ',')
fclose(fid);
iscell(C)
T = cell2mat(C)
The answer I get is:
C =
Columns 1 through 4
[300744x1 double] [300744x1 double] [300744x1 double] [300744x1 double]
Columns 5 through 8
[300744x1 double] {300744x1 cell} {300744x1 cell} {300744x1 cell}
ans =
1
??? Error using ==> cell2mat at 46
All contents of the input cell array must be of the same data type.
Error in ==> test at 5
T = cell2mat(C)
My question is how I can do that? The data is in the following link vessel-movements.csv. It contains numbers, as ids and coordinates, and timestamps.
As the error message says:
All contents of the input cell array must be of the same data type.
Columns 6, 7 and 8 are chars (datestrings). It's not possible to convert them into a Matrix. Leave them in a cell.
You can transform only the numerical data into a matrix: data = cell2mat(C(:,1:5)). The three left columns have to be converted with datenum() into a numerical time to add it to data matrix.
When you've got >=R2013b, you can use as datatype a table like: data = readtable('vessel-movements.csv');
I assume you only want to convert the first five columns of C, which are the ones that contain numeric data. You can use cell2mat as follows:
M = cell2mat(C(:,1:5));
or equivalently
M = [C{:,1:5}];
The main difference between a matrix and a cell array (in MATLAB parlance) is that a matrix holds elements of the same type and size, while a cell array holds elements of different types and sizes.
You read numbers and strings. The numbers do have the same type and size (double, 1×1) while the strings are different (they're all char type, but usually different sizes).
To group your numeric data, you must select only the numeric elements of your cell array:
N = horzcat(C{1:5});
while for the strings you should keep the cell array structure:
S = horzcat(C{6:8});
Later edit: Since you admit that you're new to MATLAB, I'm going to make a general recommendation: every time you see a function that you don't know what it does—or behaves unexpectedly from your point of view—mark its name and press F1. The MATLAB documentation is quite comprehensive, and contains also lots of examples depicting the typical uses for that function.
I want to import data from a text file with row and column headers, and it in a matrix. For instance, the input file looks as follows:
data c1 c2 c3 c4
r1 1 2 3 4
r2 5 6 7 8
Also, is it possible to access the row and column names with the corresponding data element? And is it possible to modify that based on the result of operations?
Thanks in advance.
I would use textscan with an extra %*s in the format string to gobble up the first header column in each row. The first header row should be used to count the number of columns, in case it is unknown:
fid = fopen('input.txt'); %// Open the input file
%// Read the first header row and calculate the number of columns in the file
C = textscan(fid, '%s', 1, 'Delimiter', '\n', 'MultipleDelimsAsOne', true);
cols = numel(regexp(C{1}{1}, '\s*\w+'));
%// Read the rest of the rows and store the data values in a matrix
C = textscan(fid, ['%*s', repmat('%f', 1, cols - 1)]);
A = [C{:}]; %// Store the data in a matrix
fclose(fid); %// Close the input file
The data is stored in matrix A.
From the documentation on readtable see http://www.mathworks.com/help/matlab/ref/readtable.html
T = readtable(filename, 'ReadVariableNames', true) if the first column has the headers
or
T = readtable(filename, 'ReadRowNames', true) if the first row has the headers
You may also be interested into the 'HeaderLines' name-value pair if you'd like to drop more than just the first line.
You could use importdata, for example, supposing the delimiter is "tab",
rawdata = importdata(filename, '\t');
row_names = rawdata.textdata(2:end,1);
col_names = rawdata.textdata(1, 2:end);
data_mat = rawdata.data;
The row_names and col_names are cell array types. If you like them to be one string delimited by \t or ,, etc., you could use strjoin on them.
Consider the following file
var1 var2 variable3
1 2 3
11 22 33
I would like to load the numbers into a matrix, and the column titles into a variable that would be equivalent to:
variable_names = char('var1', 'var2', 'variable3');
I don't mind to split the names and the numbers in two files, however preparing matlab code files and eval'ing them is not an option.
Note that there can be an arbitrary number of variables (columns)
I suggest importdata for operations like this:
d = importdata('filename.txt');
The return is a struct with the numerical fields in a member called 'data', and the column headers in a field called 'colheaders'.
Another useful interface for importing manipulating data like these is the 'dataset' class available in the Statistics Toolbox.
If the header is on the first row then
A = dlmread(filename,delimString,2,1);
will read the numeric data into the Matrix A.
You can then use
fid = fopen(filename)
headerString = fscanf(fid,'%s/n') % reads header data into a string
fclose(fid)
You can then use strtok to split the headerString into a cell array. Is one approach I can think of deal with an unknown number of columns
Edit
fixed fscanf function call
Just use textscan with different format specifiers.
fid = fopen(filename,'r');
heading = textscan(fid,'%s %s %s',1);
fgetl(fid); %advance the file pointer one line
data = textscan(fid,'%n %n %n');%read the rest of the data
fclose(fid);
In this case 'heading' will be a cell array containing cells with each column heading inside, so you will have to change them into cell array of strings or whatever it is that you want. 'data' will be a cell array containing a numeric array for each column that you read, so you will have to cat them together to make one matrix.