Reading one column from a text file - matlab

What is the equivalent of fgel and fgets in MATLAB for reading one column at a time (not a line) from a text file?

You cannot avoid reading the file. However, if your dataset is large, you can tell MATLAB to ignore the irrelevant parts while reading the file.
For instance, if your columns are space delimited, and you want to read the floating-point numbers in the first column, you can try the following:
fid = fopen('input.txt');
C = textscan(fid, '%f %*[^\n]');
C = C{:};
fclose(fid);
This still reads the entire file, but stores only the first column in memory.

Related

Read the data from specific line in MATLAB

I have a sequence of data files(".tab" files) with more than 11100 rows and 236 columns. Data begins from 297th line in one file and from 299th line in another file. How can I read the data from 297th row of each file in MATLAB R2014a?
I am not quite sure, bu it seems that a typical machine's memory can handle such a file size. In that case, you can use textscan or textread MATLAB built-in functions.
Nonetheless, if you really cannot import your data into MATLAB environment, set HeaderLines argument of textscan to the line of interest. A simple example can be found in MATLAB documentations, or:
SelectedData = textscan(ID,formatSpec,'HeaderLines',296); % Ignore 296 first lines of the data
First of all, I strongly recommend to review the MATLAB documentation. Assuming you have several files in hand (stored in fileNames:
for i = 1:numel(fileNames)
ID = fopen(fileNames{i});
formatSpec = '%s %[^\n]'; % Modify this based on your file structure
SelectedData{i} = textscan(ID,formatSpec,'HeaderLines',296);
fclose(ID);
end
SelectedData is a cell string containing all your data extracted from corresponding data (fileNames)

I want matlab to read a comma separated txt file with 100s of columns

On http://www.mathworks.com/help/matlab/ref/textscan.html, I can see the suggestion:
fileID = fopen('data3.csv');
C = textscan(fileID,'%f %f %f %f','Delimiter',',',...
'MultipleDelimsAsOne',1);
fclose(fileID);
celldisp(C)
Not sure if textscan can also .txt but I can't really write out 100s of '%f's. Is there a way to do this by giving textscan the dimensions of the mtx in my .txt file? Thanks.
If you have a file that is only numbers, and the text is comma separated (.csv), then you can use csvread:
num_headerlines = 1
C = csvread('C:\users\smith\Documents\data3.csv', num_headerlines, 0)
The last two arguments here are the row and column to begin reading, and unlike most everything else in MATLAB, they are 0-indexed, so if you want to start on the first column, you pass a 0, and if you want to start on the second row, you pass a 1. This will read as many columns as you have, without needed a long format specifier.

Confused with .tsv files in MATLAB (converting to a Matrix?)

I have a .tsv file that I wish to open in MATLAB, however I am having several problems with this.
I have tried the following
fid = fopen('data.tsv');
C = textscan(fid, ['%s' repmat('%f',1,8)], 'HeaderLines', 1);
fclose(fid);
and got some weird values that had nothing to do with my file. I also tried:
data = dlmread('data.tsv', '\t');
and got this
Error using dlmread (line 139)
Mismatch between file and format string.
Trouble reading number from file (row 1u, field 1u) ==> Participant Assessment
Experiment Block Trial
Answer Reaction Timestamp Free Response\n
Is there some way I can get it to ignore the header, or am I doing it totally wrong?
With dlmread you can specify where to start reading in the file. This is one of the few times that MATLAB indexing begins at 0 - [0,0] is the first row, first column. Therefore, to ignore the first row (containing your header):
data = dlmread('data.tsv','\t', 1, 0);
This will only work if all the values (other than the header lines you skip) are numeric.
Your example with textscan also looks fine to me (provided that the format supplied is correct and there is indeed only one header line). C will be a cell array; to obtain the data from each column use C{n} where n is the column number.
Rather than skipping the header line, it's sometimes useful to just read it in to a separate value:
fid = fopen('data.tsv');
C_header = textscan(fid, '%s',9);
C = textscan(fid, ['%s' repmat('%f',1,8)]);
fclose(fid);

Textscan generates a vectore twice the expected size

I want to load a csv file in a matrix using matlab.
I used the following code:
formatSpec = ['%*f', repmat('%f',1,20)];
fid = fopen(filename);
X = textscan(fid, formatSpec, 'Delimiter', ',', 'CollectOutput', 1);
fclose(fid);
X = X{1};
The csv file has 1000 rows and 21 columns.
However, the matrix X generated has 2000 columns and 20 columns.
I tried using different delimiters like '\t' or '\n', but it doesn't change.
When I displayed X, I noticed that it displayed the correct csv file but with extra rows of zeros every 2 rows.
I also tried adding the 'HeaderLines' parameters:
`X = textscan(fid, formatSpec1, 'Delimiter', '\n', 'CollectOutput', 1, 'HeaderLines', 1);`
but this time, the result is an empty matrix.
Am I missing something?
EDIT: #horchler
I could read with no problem the 'test.csv' file.
There is no extra comma at the end of each row. I generated my csv file with a python script: I read the rows of another csv file, modified these (selecting some of them and doing arithmetic operations on them) and wrote the new rows on another csv file. In order to do this, I converted each element of the first csv file into floats...
New Edit:
Reading the textscan documentation more carefully, I think the problem is that my input file is neither a textfile nor a str, but a file containing floats
EDIT: three lines from the file
0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0,2
1,-0.3834323,-1.92452324171,-1.2453254094,0.43455627857,-0.24571121,0.4340657,1,1,0,0,0,0.3517396202,1,0,0,0.3558122164,0.2936975319,0.4105696144,0,1,0
-0.78676,-1.09767,0.765554578,0.76579043,0.76,1,0,0,323124.235998,1,0,0,0,1,0,0,1,0,0,0,2
How about using regex ?
X=[];
fid = fopen(filename);
while 1
fl = fgetl(fid);
if ~ischar(fl), break, end
r =regexp(fl,'([-]*\d+[.]*\d*)','match');
r=r(1:21); % because your line 2nd is somehow having 22 elements,
% all lines must have same # elements or an error will be thrown
% Error: CAT arguments dimensions are not consistent.
X=[X;r];
end
fclose(fid);
Using csvread to read a csv file seems a good option. However, I also tend to read csv files with textscan as files are sometimes badly written. Having more options to read them is therefore necessary.
I face a reading problem like yours when I think the file is written a certain way but it is actually written another way. To debug it I use fgetl and print, for each line read, both the output of fgetl and its double version (see the example below). Examining the double version, you may find which character causes a problem.
In your case, I would first look at multiple occurrences of delimiters (',' and '\t') and , in 'textscan', I would activate the option 'MultipleDelimsAsOne' (while turning off 'CollectOutput').
fid = fopen(filename);
tline = fgetl(fid);
while ischar(tline)
disp(tline);
double(tline)
pause;
tline = fgetl(fid);
end
fclose(fid);

How do you create a matrix from a text file in MATLAB?

I have a text file which has 4 columns, each column having 65536 data points. Every element in the row is separated by a comma. For example:
X,Y,Z,AU
4010.0,3210.0,-440.0,0.0
4010.0,3210.0,-420.0,0.0
etc.
So, I have 65536 rows, each row having 4 data values as shown above. I want to convert it into a matrix. I tried importing data from the text file to an excel file, because that way its easy to create a matrix, but I lost more than half the data.
If all the entries in your file are numeric, you can simply use a = load('file.txt'). It should create a 65536x4 matrix a. It is even easier than csvread
Have you ever tried using 'importdata'?
The parameters you need only file name and delimiter.
>> tmp_data = importdata('your_file.txt',',')
tmp_data =
data: [2x4 double]
textdata: {'X' 'Y' 'Z' 'AU'}
colheaders: {'X' 'Y' 'Z' 'AU'}
>> tmp_data.data
ans =
4010 3210 -440 0
4010 3210 -420 0
>> tmp_data.textdata
ans =
'X' 'Y' 'Z' 'AU'
Instead of messing with Excel, you should be able to read the text file directly into MATLAB (using the functions FOPEN, FGETL, FSCANF, and FCLOSE):
fid = fopen('file.dat','rt'); %# Open the data file
headerChars = fgetl(fid); %# Read the first line of characters
data = fscanf(fid,'%f,%f,%f,%f',[4 inf]).'; %'# Read the data into a
%# 65536-by-4 matrix
fclose(fid); %# Close the data file
The easiest way to do it would be to use MATLAB's csvread function.
There is also this tool which reads CSV files.
You could do it yourself without too much difficulty either: Just loop over each line in the file and split it on commas and put it in your array.
Suggest you familiarize yourself with dlmread and textscan.
dlmread is like csvread but because it can handle any delimiter (tab, space, etc), I tend to use it rather than csvread.
textscan is the real workhorse: lots of options, + it works on open files and is a little more robust to handling "bad" input (e.g. non-numeric data in the file). It can be used like fscanf in gnovice's suggestion, but I think it is faster (don't quote me on that though).