Import text file in MATLAB - matlab

I have a tab delimited text file with suffix .RAW.
How can I load the data from the file into a matrix in MATLAB?
I have found readtable, but it doesn't support files ending with suffix .RAW.
Do I really have to use fread, fscanf, etc. to simply load a text file into a matrix?

You can use the dlmread() function. It will read data from an ASCII text file into a matrix and let you define the delimiter yourself. The delimiter for tabs is '\t'.
>> M = dlmread('Data.raw', '\t')
M =
1 2 3
4 5 6
7 8 9
Just for your information there is also the tdfread() function but I do not recommend using it except in very specific cases. dlmread() is a much better option.

.RAW is a generic file extention. You should know the format of your RAW file (especially if your file contains a combination of numbers, data structures etc). If it is a simple text file with a single 2D table, you can easily read it with fscanf, fread, fgetl, fgets, etc
Here is a simple example for a 2D table (matrix):
Let's assume that each row of your table is separated by a carriage return from its following rows. We can read each row by fgetl() and then extract numbers using str2num().
fid=fopen('YourTextFile.RAW');
Data=[];
i = 0;
while 1
i = i + 1;
tline = fgetl(fid);
if ~ischar(tline), break, end
Data(i,:) = str2num(tline);
end
fclose(fid);
disp(Data)
For more complex data structure, the code should be changed.
For a 2D table (a special case) the above simple code can be easily exchanged by dlmread() function.

Related

load txt dataset into matlab in a matrix structure

I need to read a txt dataset and do some analytics by matlab. the structure of the txt file is like this:
ID Genre AgeScale
1 M 20-26
2 F 18-25
So, I want to load this txt file and build a matrix. I was wondering if someone could help me with this. I used fopen function but it gives me a single array not a matrix with 3 columns.
MATLAB has an interactive data importer. Just type uiimport in the command window. It allows you to:
Name the variable based on heading as shown in your example. You can also manually change them.
Specify variable (column) type, e.g. numeric, cell array of strings, etc.
Auto generate an import script for next use (if desired)
If it works for you then congratulations, you don't need to waste hours to write an data import script :)
Function fopen only returns the file ID and not the content of the file. Once you open the file you use the file ID to read line by line and then parse each line with strsplit using space as the delimiter.
Here's one simple way of doing so:
fid = fopen('textfile.txt');
tline = fgetl(fid);
n = 1;
while ischar(tline)
data(n,:) = strsplit(tline(1:end-1),' ');
n=n+1;
tline = fgetl(fid);
end
fclose(fid);
Keep in mind that the matrix data is type string and not numeric, so if you want to use the numeric values of your dataset, you'll need to take a look at the functions str2num (str2double in newer versions) and strtok to split the AgeScale strings with delimiter '-'.

Memory map file in MATLAB?

I have decided to use memmapfile because my data (typically 30Gb to 60Gb) is too big to fit in a computer's memory.
My data files consist two columns of data that correspond to the outputs of two sensors and I have them in both .bin and .txt formats.
m=memmapfile('G:\E-Stress Research\Data\2013-12-18\LD101_3\EPS/LD101_3.bin','format','int32')
m.data(1)
I used the above code to memory map my data to a variable "m" but I have no idea what data format to use (int8', 'int16', 'int32', 'int64','uint8', 'uint16', 'uint32', 'uint64', 'single', and 'double').
In fact I tried all of the data formats listed that MATLAB supports, but when I used the m.data(index number) I never get a pair of numbers (2 columns of data) which is what I expected, also the number will be different depending on the format I used.
If anyone has experience with memmapfile please help me.
Here are some smaller versions of my data files so people can understand how my data is structured:
cheers
James
memmapfile is designed for reading binary files, that's why you are having trouble with your text file. The data in there is characters, so you'll have to read them as characters and then parse them into numbers. More on that below.
The binary file appears to contain more than just a stream of floating point values written in binary format. I see identifiers (strings) and other things in the file as well. Your only hope of reading that is to contact the manufacturer of the device that created the binary file and ask them about how to read in such files. There'll probably be an SDK, or at least a description of the format. You might want to look into this as the floating point numbers in your text file might be truncated, i.e., you have lost precision compared to directly reading the binary representation of the floats.
Ok, so how to read your file with memmapfile? This post provides some hints.
So first we open your file as 'uint8' (note there is no 'char' option, so as a workaround we read the content of the file into a datatype of the same size):
m = memmapfile('RTL5_57.txt','Format','uint8'); % uint8 is default, you could leave that off
We can render the data read in as uint8 as characters by casting it to char:
c = char(m.Data(1:19)).' % read the first three lines. NB: transpose just for getting nice output, don't use it in your code
c =
0.398516 0.063440
0.399611 0.063284
0.398985 0.061253
As each line in your file has the same length (2*8 chars for the numbers, 1 tab and 2 chars for newline = 19 chars), we can read N lines from the file by reading N*19 values. So m.Data(1:19) gets you the first line, m.Data(20:38), the second line, and m.Data(20:57) the second and third lines. Read as much as you want at once.
Then we'll have to parse the read-in data into floating point numbers:
f = sscanf(c,'%f')
f =
0.3985
0.0634
0.3996
0.0633
0.3990
0.0613
All that's left now is to reshape them into your two column format
d = reshape(f,2,[]).'
d =
0.3985 0.0634
0.3996 0.0633
0.3990 0.0613
Easier ways than using memmapfile:
You don't need to use memmapfile to solve your problem, and I think it makes things more complicated. You can simply use fopen followed by fread:
fid = fopen('RTL5_57.txt');
c = fread(fid,Nlines*19,'*char');
% now sscanf and reshape as above
% NB: one can read the values the text file directly with f = fscanf(fid,'%f',Nlines*19).
% However, in testing, I have found calling fread followed by sscanf to be faster
% which will make a significant difference when reading such large files.
Using this you can read Nlines pairs of values at a time, process them and simply call fread again to read the next Nlines. fread remembers where it is in the file (as does fscanf), so simply use same call to get next lines. Its thus easy to write a loop to process the whole file, testing with feof(fid) if you are at the end of the file.
An even easier way is suggested here: use textscan. To slightly adapt their example code:
Nlines = 10000;
% describe the format of the data
% for more information, see the textscan reference page
format = '%f\t%f';
fid = fopen('RTL5_57.txt');
while ~feof(fid)
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point if you need the memory!
% process d
end
fclose(fid);
Note again however that the fread followed by sscanf will be fastest. Note however that the fread method would die as soon as there is one line in the text file that doesn't exactly match your format. textscan is forgiving of whitespace changes on the other hand and thus more robust.

How to load data for Classification in Matlab

I have a text file containing thousands of attributes (each column indicates an attribute) and a column that shows the labels of each row.All data is numeric except the last column which is the labels. This column is string. I want to use matlab classification functions such as gscatter() to classify the data. The problem is that when I use load filename in matlab to load my data I get this error (in which "no" is one of the lables)
Unknown text on line number 1 of ASCII file C:\Program Files\MATLAB\R2011b\train\train.txt
"no".
In fact I do not know how to load my data in matlab to be able to use matlab functions to classify the data.
Load is only for .mat files and text files with only numeric data, which is why you get an error.
There are a number of functions which do read text files though.
Depending on the format of your data files, you could use one of the following:
textread is pretty general but requires you to supply the format and to open and close the file.
csvread reads only numeric, comma-separated value, but you don't have to provide a format.
importdata is very general and convenient
fscanf is similar to textread
Given the number of attributes you're dealing with, I'd definitely go with importdata myself.
Here is an example
train.txt
1,2,3,4,5,6,no
2,3,4,5,6,7,yes
myLoadScript.m
numAttribs = 6; %# number of attributes (excluding the label)
frmt = [repmat('%f ',1,numAttribs) '%s'];
fid = fopen('train.txt', 'rt');
C = textscan(fid, frmt, 'Delimiter',',', 'CollectOutput',1);
fclose(fid);
The result:
>> C{1}
ans =
1 2 3 4 5 6
2 3 4 5 6 7
>> C{2}
ans =
'no'
'yes'
Should be easy to adapt to work on your specific file format...

Fastest way to import CSV files in MATLAB

I've written a script that saves its output to a CSV file for later reference, but the second script for importing the data takes an ungainly amount of time to read it back in.
The data is in the following format:
Item1,val1,val2,val3
Item2,val4,val5,val6,val7
Item3,val8,val9
where the headers are on the left-most column, and the data values take up the remainder of the row. One major difficulty is that the arrays of data values can be different lengths for each test item. I'd save it as a structure, but I need to be able to edit it outside the MATLAB environment, since sometimes I have to delete rows of bad data on a computer that doesn't have MATLAB installed. So really, part one of my question is: Should I save the data in a different format?
Second part of the question:
I've tried importdata, csvread, and dlmread, but I'm not sure which is best, or if there's a better solution. Right now I'm using my own script using a loop and fgetl, which is horribly slow for large files. Any suggestions?
function [data,headers]=csvreader(filename); %V1_1
fid=fopen(filename,'r');
data={};
headers={};
count=1;
while 1
textline=fgetl(fid);
if ~ischar(textline), break, end
nextchar=textline(1);
idx=1;
while nextchar~=','
headers{count}(idx)=textline(1);
idx=idx+1;
textline(1)=[];
nextchar=textline(1);
end
textline(1)=[];
data{count}=str2num(textline);
count=count+1;
end
fclose(fid);
(I know this is probably terribly written code - I'm an engineer, not a programmer, please don't yell at me - any suggestions for improvement would be welcome, though.)
It would probably make the data easier to read if you could pad the file with NaN values when your first script creates it:
Item1,1,2,3,NaN
Item2,4,5,6,7
Item3,8,9,NaN,NaN
or you could even just print empty fields:
Item1,1,2,3,
Item2,4,5,6,7
Item3,8,9,,
Of course, in order to pad properly you would need to know what the maximum number of values across all the items is before hand. With either format above, you could then use one of the standard file reading functions, like TEXTSCAN for example:
>> fid = fopen('uneven_data.txt','rt');
>> C = textscan(fid,'%s %f %f %f %f','Delimiter',',','CollectOutput',1);
>> fclose(fid);
>> C{1}
ans =
'Item1'
'Item2'
'Item3'
>> C{2}
ans =
1 2 3 NaN %# TEXTSCAN sets empty fields to NaN anyway
4 5 6 7
8 9 NaN NaN
Instead of parsing the string textline one character at a time. You could use strtok to break the string up for example
stringParts = {};
tline = fgetl(fid);
if ~ischar(tline), break, end
i=1;
while 1
[stringParts{i},r]=strtok(tline,',');
tline=r;
i=i+1;
if isempty(r), break; end
end
% store the header
headers{count} = stringParts{1};
% convert the data into numbers
for j=2:length(stringParts)
data{count}(j-1) = str2double(stringParts{j});
end
count=count+1;
I've had the same problem with reading csv data in Matlab, and I was surprised by how little support there is for this, but then I just found the import data tool. I'm in r2015b.
On the top bar in the "Home" tab, click on "Import Data" and choose the file you'd like to read. An app window will come up like this:
Import Data tool screenshot
Under "Import Selection" you have the option to "generate function", which gives you quite a bit of customization options, including how to fill empty cells, and what you'd like the output data structure to be. Plus it's written by MathWorks, so it's probably utilizing the fastest available method to read csv files. It was almost instantaneous on my file.
Q1) If you know the max number of columns you can fill empty entries with NaN
Also, if all values are numerical, do you really need "Item#" column? If yes, you can use only "#", so all data is numerical.
Q2) The fastest way to read num. data from a file without mex-files is csvread.
I try to avoid using strings in csv files, but if I have to, I use my csv2cell function:
http://www.mathworks.com/matlabcentral/fileexchange/20135-csv2cell

How do you create a matrix from a text file in MATLAB?

I have a text file which has 4 columns, each column having 65536 data points. Every element in the row is separated by a comma. For example:
X,Y,Z,AU
4010.0,3210.0,-440.0,0.0
4010.0,3210.0,-420.0,0.0
etc.
So, I have 65536 rows, each row having 4 data values as shown above. I want to convert it into a matrix. I tried importing data from the text file to an excel file, because that way its easy to create a matrix, but I lost more than half the data.
If all the entries in your file are numeric, you can simply use a = load('file.txt'). It should create a 65536x4 matrix a. It is even easier than csvread
Have you ever tried using 'importdata'?
The parameters you need only file name and delimiter.
>> tmp_data = importdata('your_file.txt',',')
tmp_data =
data: [2x4 double]
textdata: {'X' 'Y' 'Z' 'AU'}
colheaders: {'X' 'Y' 'Z' 'AU'}
>> tmp_data.data
ans =
4010 3210 -440 0
4010 3210 -420 0
>> tmp_data.textdata
ans =
'X' 'Y' 'Z' 'AU'
Instead of messing with Excel, you should be able to read the text file directly into MATLAB (using the functions FOPEN, FGETL, FSCANF, and FCLOSE):
fid = fopen('file.dat','rt'); %# Open the data file
headerChars = fgetl(fid); %# Read the first line of characters
data = fscanf(fid,'%f,%f,%f,%f',[4 inf]).'; %'# Read the data into a
%# 65536-by-4 matrix
fclose(fid); %# Close the data file
The easiest way to do it would be to use MATLAB's csvread function.
There is also this tool which reads CSV files.
You could do it yourself without too much difficulty either: Just loop over each line in the file and split it on commas and put it in your array.
Suggest you familiarize yourself with dlmread and textscan.
dlmread is like csvread but because it can handle any delimiter (tab, space, etc), I tend to use it rather than csvread.
textscan is the real workhorse: lots of options, + it works on open files and is a little more robust to handling "bad" input (e.g. non-numeric data in the file). It can be used like fscanf in gnovice's suggestion, but I think it is faster (don't quote me on that though).