load txt dataset into matlab in a matrix structure - matlab

I need to read a txt dataset and do some analytics by matlab. the structure of the txt file is like this:
ID Genre AgeScale
1 M 20-26
2 F 18-25
So, I want to load this txt file and build a matrix. I was wondering if someone could help me with this. I used fopen function but it gives me a single array not a matrix with 3 columns.

MATLAB has an interactive data importer. Just type uiimport in the command window. It allows you to:
Name the variable based on heading as shown in your example. You can also manually change them.
Specify variable (column) type, e.g. numeric, cell array of strings, etc.
Auto generate an import script for next use (if desired)
If it works for you then congratulations, you don't need to waste hours to write an data import script :)

Function fopen only returns the file ID and not the content of the file. Once you open the file you use the file ID to read line by line and then parse each line with strsplit using space as the delimiter.
Here's one simple way of doing so:
fid = fopen('textfile.txt');
tline = fgetl(fid);
n = 1;
while ischar(tline)
data(n,:) = strsplit(tline(1:end-1),' ');
n=n+1;
tline = fgetl(fid);
end
fclose(fid);
Keep in mind that the matrix data is type string and not numeric, so if you want to use the numeric values of your dataset, you'll need to take a look at the functions str2num (str2double in newer versions) and strtok to split the AgeScale strings with delimiter '-'.

Related

Read the data from specific line in MATLAB

I have a sequence of data files(".tab" files) with more than 11100 rows and 236 columns. Data begins from 297th line in one file and from 299th line in another file. How can I read the data from 297th row of each file in MATLAB R2014a?
I am not quite sure, bu it seems that a typical machine's memory can handle such a file size. In that case, you can use textscan or textread MATLAB built-in functions.
Nonetheless, if you really cannot import your data into MATLAB environment, set HeaderLines argument of textscan to the line of interest. A simple example can be found in MATLAB documentations, or:
SelectedData = textscan(ID,formatSpec,'HeaderLines',296); % Ignore 296 first lines of the data
First of all, I strongly recommend to review the MATLAB documentation. Assuming you have several files in hand (stored in fileNames:
for i = 1:numel(fileNames)
ID = fopen(fileNames{i});
formatSpec = '%s %[^\n]'; % Modify this based on your file structure
SelectedData{i} = textscan(ID,formatSpec,'HeaderLines',296);
fclose(ID);
end
SelectedData is a cell string containing all your data extracted from corresponding data (fileNames)

Import text file in MATLAB

I have a tab delimited text file with suffix .RAW.
How can I load the data from the file into a matrix in MATLAB?
I have found readtable, but it doesn't support files ending with suffix .RAW.
Do I really have to use fread, fscanf, etc. to simply load a text file into a matrix?
You can use the dlmread() function. It will read data from an ASCII text file into a matrix and let you define the delimiter yourself. The delimiter for tabs is '\t'.
>> M = dlmread('Data.raw', '\t')
M =
1 2 3
4 5 6
7 8 9
Just for your information there is also the tdfread() function but I do not recommend using it except in very specific cases. dlmread() is a much better option.
.RAW is a generic file extention. You should know the format of your RAW file (especially if your file contains a combination of numbers, data structures etc). If it is a simple text file with a single 2D table, you can easily read it with fscanf, fread, fgetl, fgets, etc
Here is a simple example for a 2D table (matrix):
Let's assume that each row of your table is separated by a carriage return from its following rows. We can read each row by fgetl() and then extract numbers using str2num().
fid=fopen('YourTextFile.RAW');
Data=[];
i = 0;
while 1
i = i + 1;
tline = fgetl(fid);
if ~ischar(tline), break, end
Data(i,:) = str2num(tline);
end
fclose(fid);
disp(Data)
For more complex data structure, the code should be changed.
For a 2D table (a special case) the above simple code can be easily exchanged by dlmread() function.

Matrix segmentation into files in Matlab

I have a very large matrix (M X N). I want to divide matrix into 10 equal parts (almost) and save each of them into a separate file say A1.txt, A2.txt, etc. or .mat format. How can I do this ?
Below is a code to divide a matrix into 10 equal parts and data_size is (M / 10).
for i=1:10
if i==1
data = DATA(1:data_size,:);
elseif i==10
data = DATA((i-1)*data_size+1:end,:);
else
data = DATA((i-1)*data_size+1: i*data_size,:);
end
save data(i).mat data
% What should I write here in order to save data into separate file data1.mat, data2.mat etc.
end
You said you wanted it in either txt format or mat format. I'll provide both solutions, and some of this is attributed to Daniel in his comment in your post above.
Saving as a text file
You can use fopen to open a file up for writing. This returns an ID to the file that you want to write to. After this, use fprintf and specify the ID to the file that you want to write to, and the data you want to write to this file. As such, with sprintf, generate the text file name you want, then use fprintf to write data to your file. It should be noted that writing matrices to fprintf in MATLAB assume column major format. If you don't want your data written this way and want it done in row-major, you need to transpose your data before you write this to file. I'll provide both methods in the code depending on what you want.
After you're done, use fclose to close the file noting that you have finished writing to it. Therefore, you would do this:
for i=1:10
if i==1
data = DATA(1:data_size,:);
elseif i==10
data = DATA((i-1)*data_size+1:end,:);
else
data = DATA((i-1)*data_size+1: i*data_size,:);
end
filename = sprintf('A%d.txt', i); %// Generate file name
fid = fopen(filename, 'w'); % // Open file for writing
fwrite(fid, '%f ', data.'); %// Write to file - Transpose for row major!
%// fwrite(fid, '%f ', data); %// Write to file - Column major!
fclose(fid); %// Close file
end
Take note that I space separated the numbers so you can open up the file and see how these values are written accordingly. I've also used the default precision and formatting by just using %f. You can play around with this by looking at the fprintf documentation and customizing the precision and leading zero formatting to your desire.
Saving to a MAT file
This is actually a more simpler approach. You would still use sprintf to save your data, then use the save command to save your workspace variables to file. Therefore, your loop would be this:
for i=1:10
if i==1
data = DATA(1:data_size,:);
elseif i==10
data = DATA((i-1)*data_size+1:end,:);
else
data = DATA((i-1)*data_size+1: i*data_size,:);
end
filename = sprintf('A%d.mat', i); %// Generate file name
save(filename, 'data');
end
Take note that the variable you want to save must be a string. This is why you have to put single quotes around the data variable as this is the variable you are writing to file.
You can use
save(['data' num2str(i) '.mat'], 'data');
where [ ] is used to concatenate strings and num2str to convert an integer to a string.

Fastest way to import CSV files in MATLAB

I've written a script that saves its output to a CSV file for later reference, but the second script for importing the data takes an ungainly amount of time to read it back in.
The data is in the following format:
Item1,val1,val2,val3
Item2,val4,val5,val6,val7
Item3,val8,val9
where the headers are on the left-most column, and the data values take up the remainder of the row. One major difficulty is that the arrays of data values can be different lengths for each test item. I'd save it as a structure, but I need to be able to edit it outside the MATLAB environment, since sometimes I have to delete rows of bad data on a computer that doesn't have MATLAB installed. So really, part one of my question is: Should I save the data in a different format?
Second part of the question:
I've tried importdata, csvread, and dlmread, but I'm not sure which is best, or if there's a better solution. Right now I'm using my own script using a loop and fgetl, which is horribly slow for large files. Any suggestions?
function [data,headers]=csvreader(filename); %V1_1
fid=fopen(filename,'r');
data={};
headers={};
count=1;
while 1
textline=fgetl(fid);
if ~ischar(textline), break, end
nextchar=textline(1);
idx=1;
while nextchar~=','
headers{count}(idx)=textline(1);
idx=idx+1;
textline(1)=[];
nextchar=textline(1);
end
textline(1)=[];
data{count}=str2num(textline);
count=count+1;
end
fclose(fid);
(I know this is probably terribly written code - I'm an engineer, not a programmer, please don't yell at me - any suggestions for improvement would be welcome, though.)
It would probably make the data easier to read if you could pad the file with NaN values when your first script creates it:
Item1,1,2,3,NaN
Item2,4,5,6,7
Item3,8,9,NaN,NaN
or you could even just print empty fields:
Item1,1,2,3,
Item2,4,5,6,7
Item3,8,9,,
Of course, in order to pad properly you would need to know what the maximum number of values across all the items is before hand. With either format above, you could then use one of the standard file reading functions, like TEXTSCAN for example:
>> fid = fopen('uneven_data.txt','rt');
>> C = textscan(fid,'%s %f %f %f %f','Delimiter',',','CollectOutput',1);
>> fclose(fid);
>> C{1}
ans =
'Item1'
'Item2'
'Item3'
>> C{2}
ans =
1 2 3 NaN %# TEXTSCAN sets empty fields to NaN anyway
4 5 6 7
8 9 NaN NaN
Instead of parsing the string textline one character at a time. You could use strtok to break the string up for example
stringParts = {};
tline = fgetl(fid);
if ~ischar(tline), break, end
i=1;
while 1
[stringParts{i},r]=strtok(tline,',');
tline=r;
i=i+1;
if isempty(r), break; end
end
% store the header
headers{count} = stringParts{1};
% convert the data into numbers
for j=2:length(stringParts)
data{count}(j-1) = str2double(stringParts{j});
end
count=count+1;
I've had the same problem with reading csv data in Matlab, and I was surprised by how little support there is for this, but then I just found the import data tool. I'm in r2015b.
On the top bar in the "Home" tab, click on "Import Data" and choose the file you'd like to read. An app window will come up like this:
Import Data tool screenshot
Under "Import Selection" you have the option to "generate function", which gives you quite a bit of customization options, including how to fill empty cells, and what you'd like the output data structure to be. Plus it's written by MathWorks, so it's probably utilizing the fastest available method to read csv files. It was almost instantaneous on my file.
Q1) If you know the max number of columns you can fill empty entries with NaN
Also, if all values are numerical, do you really need "Item#" column? If yes, you can use only "#", so all data is numerical.
Q2) The fastest way to read num. data from a file without mex-files is csvread.
I try to avoid using strings in csv files, but if I have to, I use my csv2cell function:
http://www.mathworks.com/matlabcentral/fileexchange/20135-csv2cell

How do you create a matrix from a text file in MATLAB?

I have a text file which has 4 columns, each column having 65536 data points. Every element in the row is separated by a comma. For example:
X,Y,Z,AU
4010.0,3210.0,-440.0,0.0
4010.0,3210.0,-420.0,0.0
etc.
So, I have 65536 rows, each row having 4 data values as shown above. I want to convert it into a matrix. I tried importing data from the text file to an excel file, because that way its easy to create a matrix, but I lost more than half the data.
If all the entries in your file are numeric, you can simply use a = load('file.txt'). It should create a 65536x4 matrix a. It is even easier than csvread
Have you ever tried using 'importdata'?
The parameters you need only file name and delimiter.
>> tmp_data = importdata('your_file.txt',',')
tmp_data =
data: [2x4 double]
textdata: {'X' 'Y' 'Z' 'AU'}
colheaders: {'X' 'Y' 'Z' 'AU'}
>> tmp_data.data
ans =
4010 3210 -440 0
4010 3210 -420 0
>> tmp_data.textdata
ans =
'X' 'Y' 'Z' 'AU'
Instead of messing with Excel, you should be able to read the text file directly into MATLAB (using the functions FOPEN, FGETL, FSCANF, and FCLOSE):
fid = fopen('file.dat','rt'); %# Open the data file
headerChars = fgetl(fid); %# Read the first line of characters
data = fscanf(fid,'%f,%f,%f,%f',[4 inf]).'; %'# Read the data into a
%# 65536-by-4 matrix
fclose(fid); %# Close the data file
The easiest way to do it would be to use MATLAB's csvread function.
There is also this tool which reads CSV files.
You could do it yourself without too much difficulty either: Just loop over each line in the file and split it on commas and put it in your array.
Suggest you familiarize yourself with dlmread and textscan.
dlmread is like csvread but because it can handle any delimiter (tab, space, etc), I tend to use it rather than csvread.
textscan is the real workhorse: lots of options, + it works on open files and is a little more robust to handling "bad" input (e.g. non-numeric data in the file). It can be used like fscanf in gnovice's suggestion, but I think it is faster (don't quote me on that though).