How can I import a CSV file into MATLAB?. A row in the file i am working with looks like:
SUNW,2-Jan-98,1998,5,40.125,41.5
There are 36 columns and 10107 rows. The first row contains the column headers. It seems that MATLAB doesn't support importing such kind of CSV files. Using the following textscanfunction reads the whole data into one cell array.
data = textscan(fid, '%s %s %d %d %f %f', ...
'HeaderLines',1, 'Delimiter',',', 'CollectOutput',1);
Is there a way I could read the data into different variable for each column?
Example 6 in the textscan documentation seems to cover the use case that you're interested in:
Using a text editor, create a comma-delimited file data2.csv that
contains the lines
abc, 2, NA, 3, 4
// Comment Here
def, na, 5, 6, 7
Designate the input that textscan should treat as comments or empty
values:
fid = fopen('data2.csv');
C = textscan(fid, '%s %n %n %n %n', 'delimiter', ',', ...
'treatAsEmpty', {'NA', 'na'}, ...
'commentStyle', '//');
fclose(fid);
textscan returns a 1-by-5 cell array C with the following cells:
C{1} = {'abc'; 'def'}
C{2} = [2; NaN]
C{3} = [NaN; 5]
C{4} = [3; 6]
C{5} = [4; 7]
While it doesn't explicitly assign each column to a separate variable, you can easily do something like col1 = C{1};.
If you have MATLAB 2011b then you can use the spreadsheet import tool.
Related
I am trying to read a large 3 GB text file into MATLAB, organized by a header with names, and with a delimiter of a space (see below fruit.txt) However, the only data needed is the Grapes Column. Since it is a huge file, I am using a loop below to only read in one column into Matlab. How can I read in only one column of data with the loop below? I have to use a loop and preselection of needed columns since the file is over 3 GB of data.
fruit.txt
Apples Grapes Oranges
3 4 A
4 G 1
6 A 3
3 4 1
A 6 1
2 2 4
filename = 'fruit.txt'
delimiter = ' ';
formatSpec = '%s%s%s[^\n\r]';
fileID = fopen(filename, 'r' ) ;
out = {};
k = 0 ;
while ~feof(fileID)
k = k+1;
C = textscan(fileID, formatSpec, 'Delimiter', delimiter);
out{end+1} = Grapes{:,2};
end
Use readmatrix and specify one header row and that you only want column 2:
readmatrix(filename, 'FileType','text', 'Delimiter', delimiter, 'NumHeaderLines', 1, 'Range', 'B:B');
I would like display the data from only one column of a .csv file in a matrix. There are multiple integer numbers (to be precise 3 numbers) separated by a semi-colon per cel for each row. Here is an example of how the data looks like:
A B
1;2;3
4;5;6
(note that A means column A, column B is empty)
The desired output would be an array in Matlab with 3 columns and 2 rows.
>> matrixFromCsvFile=
1 2 3
4 5 6
What I tried: was
fid = fopen('test.csv');
matrixFromCsvFile = textscan(fid, '%d %d %d', 'delimiter', ';')
fclose(fid);
Instead of the desired output I got this:
>> matrixFromCsvFile =
[2x1 int32] [2x1 int32] [2x1 int32]
>> matrixFromCsvFile{1}
>> ans =
1
4
Did I really just created 3 arrays within an array? I want just one. Luckily the 1 and 4 values are correct though. This already took me a long time to achieve, I'm stuck.
You can fix your example just by adding a CollectOutput flag to textscan:
M = textscan(fid, '%d %d %d', 'delimiter', ';','CollectOutput',1);
By default textscan outputs columns separately (so your data is there, just in e.g. M{1}, M{2}, M{3}). Setting CollectOutput puts consecutive columns of the same class into a single array.
e.g. this would give me five columns in five arrays:
M = textscan(fid, '%d %d %f %f %f');
This would give me two arrays, one containing the first two columns, one containing the last three:
M = textscan(fid, '%d %d %f %f %f','CollectOutput',1);
Use importdata:
M = importdata('test.csv',';',1)
matrixFromCsvFile = M.data
You could go on with
matrixFromCsvFile = cell2mat(matrixFromCsvFile);
Another question on fprintf
I have a matrix s(n,5) that I want to shorten (just take columns 3,4 and 5) into s1(n,3) and save with a different name.
s1=s(:,3:5);
txtfilename = [Filename '-1.txt'];
% Open a file for writing
fid = fopen(txtfilename, 'w');
% print values in column order
% two values appear on each row of the file
fprintf(fid, '%f %f %f\n', s1);
fclose(fid);
I don't think I understood the way to use fprintf and rewrite my new matrix, because it is sorting the values.
Thanks for your help
The problem is that MATLAB stores data in column-major order, meaning that when you do s1(:), the first three values are the first three values in the first column not the first row. (This is how fprintf will read values out of s1.) For example:
>> M = magic(3)
M =
8 1 6
3 5 7
4 9 2
>> M(:)
ans =
8
3
4
1
5
9
6
7
2
You can simply transpose the matrix to output the way you want:
fprintf(fid, '%f %f %f\n', s1.');
I want to import data from a text file with row and column headers, and it in a matrix. For instance, the input file looks as follows:
data c1 c2 c3 c4
r1 1 2 3 4
r2 5 6 7 8
Also, is it possible to access the row and column names with the corresponding data element? And is it possible to modify that based on the result of operations?
Thanks in advance.
I would use textscan with an extra %*s in the format string to gobble up the first header column in each row. The first header row should be used to count the number of columns, in case it is unknown:
fid = fopen('input.txt'); %// Open the input file
%// Read the first header row and calculate the number of columns in the file
C = textscan(fid, '%s', 1, 'Delimiter', '\n', 'MultipleDelimsAsOne', true);
cols = numel(regexp(C{1}{1}, '\s*\w+'));
%// Read the rest of the rows and store the data values in a matrix
C = textscan(fid, ['%*s', repmat('%f', 1, cols - 1)]);
A = [C{:}]; %// Store the data in a matrix
fclose(fid); %// Close the input file
The data is stored in matrix A.
From the documentation on readtable see http://www.mathworks.com/help/matlab/ref/readtable.html
T = readtable(filename, 'ReadVariableNames', true) if the first column has the headers
or
T = readtable(filename, 'ReadRowNames', true) if the first row has the headers
You may also be interested into the 'HeaderLines' name-value pair if you'd like to drop more than just the first line.
You could use importdata, for example, supposing the delimiter is "tab",
rawdata = importdata(filename, '\t');
row_names = rawdata.textdata(2:end,1);
col_names = rawdata.textdata(1, 2:end);
data_mat = rawdata.data;
The row_names and col_names are cell array types. If you like them to be one string delimited by \t or ,, etc., you could use strjoin on them.
I have a file which contains data in the following format:
filename.jpg,132,234,234,345,4555,23333,344,...,333
I have put ... to mark the fact that I have a long sequence of integers. On each line I have a total of 132 integers.
I want to read the numbers in a matrix with 132 columns and as many rows as I have in the input file. How can I read this data with textscan function? How should I specify this type of format? I also want to read the first column of filenames into a cell array.
For the cell array I have used the following syntax:
fid = fopen(inputPath);
buffer = textscan(fid, '%s%*[^\n]', 'Delimiter', ',');
close(fid);
You can follow your first call to textscan with a csvread instead:
A = csvread('data.txt', 0, 1);
The two last parameters specify row and column at which your data starts. Your cell will contain the strings from the first column, A contains a matrix with the data.
Otherwise, if you really have to use textscan, create your format string aside:
fid = fopen('data.txt', 'r');
% crate a string with as many %f as you need
fmt = ['%s' repmat('%f', 1, 132)];
buffer = textscan(fid, fmt, 'Delimiter', ',');
names = buffer{1};
A = [buffer{2:end}];
fclose(fid);