I have a sequence of data files(".tab" files) with more than 11100 rows and 236 columns. Data begins from 297th line in one file and from 299th line in another file. How can I read the data from 297th row of each file in MATLAB R2014a?
I am not quite sure, bu it seems that a typical machine's memory can handle such a file size. In that case, you can use textscan or textread MATLAB built-in functions.
Nonetheless, if you really cannot import your data into MATLAB environment, set HeaderLines argument of textscan to the line of interest. A simple example can be found in MATLAB documentations, or:
SelectedData = textscan(ID,formatSpec,'HeaderLines',296); % Ignore 296 first lines of the data
First of all, I strongly recommend to review the MATLAB documentation. Assuming you have several files in hand (stored in fileNames:
for i = 1:numel(fileNames)
ID = fopen(fileNames{i});
formatSpec = '%s %[^\n]'; % Modify this based on your file structure
SelectedData{i} = textscan(ID,formatSpec,'HeaderLines',296);
fclose(ID);
end
SelectedData is a cell string containing all your data extracted from corresponding data (fileNames)
Related
thank you very much for reading this;
Problem: I have a question regarding importing in Matlab. I am trying to design a wind turbine. To do this I need to read many different text files containing the x and y coordinates defining the shape of the airfoil (all text files come in a folder) and analyse each one in turn.
My first try was
%% Reading the datafile
%Opening the filename in the format given to us by UIUC
fid=fopen('Filename',r);
%Skipping over the first lines (asterisk)
fscanf(fid,'%*s %*s %*s \n %*s %*s ',[1,5]);
%Reading over the Values of the columns
data=fscanf(fid,'%f %f \n',[2, inf]);
%Extracting the different types of data from the datafile
x=data(1,:);
y=data(2,:);
Such code has two problems:
The first problem is the fact that there is a variable number of words that i need to remove in the first column
Question 1: Is there a way to remove just the first column of data from the database or would I have to delete all the values?
The second problem is that I have 1550 tables to look at and changing the filename 1550 times will be very tedious and time-consuming.
Question 2: Is there a way of opening all the files from the folder, one at a time, in one go?
An easy way to read tabular files is to use readtable. For example, if you know your file has 2 header lines that you want to skip, then you can simply run
t = readtable('my_file', 'HeaderLines', 2);
This does, however, mean you are working with a table and not with a MATLAB array. If you prefer to keep an array, you can simply pass the table through table2array (example below).
In terms of dealing with lots of files, suppose your working directory has the following files:
a1.txt a3.txt b1.txt ...
a2.txt a4.txt b2.txt
You should first get an array that contains all the filenames and then iterate over this array. One way to do this would be as follows.
all_files = dir('*.txt') % Use a suitable glob to match your filenames
for i = 1:length(all_files)
data = table2array(readtable(all_files(i).name, 'HeaderLines', 2);
% do stuff
end
Alternatively, if you want to read the files manually:
all_files = dir('*.txt') % Use a suitable glob to match your filenames
for i = 1:length(all_files)
fid = fopen(all_files(i).name, 'r');
% do stuff
fclose(fid);
end
Note that the code assumes that all the data files have the same format.
I have 100 data files in a folder called "Experiment1", and I need to take all of the data from them and put them into a single matrix. Each data file contains 15 columns and 40 rows of data.
The order in which the files are in the folder is arbitrary. It doesn't matter in what order they get put into the combined matrix.
I've written some code using dlmread that will do the job:
for i = 1:100
%% Read in the relevant file.
filename = ['File_' int2str(i) '.dat']
Data = dlmread(fullfile(pwd, 'Experiment1',filename));
%% Put all the data I need in a separate matrix
NeededData(1+((i-1)*40):i+((i-1)*40)-i+40,1:15) = Data(:,1:15);
end
However, there are two things I don't like about my code.
The files have random names at present, and I'd need to manually change all their names to "File_1.dat", "File_2.dat", etc.
The code is cumbersome and hard to read.
How could I do things better?
Since you've fixed the problem of defining the name of the files to be read with dir, you can improve the way you add the read data (Data) to the output matrix (NeededData).
You can sumultaneously read the input files and add the data to the output matrix by inserting the call to dlmread directly in the assignment statement:
files=dir('*.dat');
n_files=length(files)
% Initialize the output matrix as empty
NeededData_0=[]
for i=1:n_files
% Simultaneously read input file and assign data to the output matrinx
NeededData_0=[NeededData_0;dlmread(files(i).name)]
end
In case you prefer working with the inides (as in your origina approach), since you know in advance that all the files have the same (40) number of rows) you can simplify the notation as follows:
files=dir('*.dat');
n_files=length(files)
% Define the number of rows in each inout file
n_rows=40;
% Define the number of colums in each inout file
n_col=15;
NeededData_2=nan(n_rows*n_files,n_col)
% Define the sequence of rows
r_list=1:n_rows:n_rows*n_files
for i=1:3
Data=dlmread(files(i).name)
NeededData_2(r_list(i):r_list(i)+n_rows-1,:)=Data
end
Hope this helps.
Using the suggestion to use dir present in the answers I have made the following code, which is clearly an improvement on my earlier effort. I would welcome further improvements, or suggestions for alternative approaches.
files = dir('*.dat');
for i = 1:length({files.name})
%% Read in the relevant file.
Data = dlmread(files(i).name);
%% Put all the data I need in a separate matrix
NeededData(1+((i-1)*40):i+((i-1)*40)-i+40,1:15) = Data(:,1:15);
end
I need to read a txt dataset and do some analytics by matlab. the structure of the txt file is like this:
ID Genre AgeScale
1 M 20-26
2 F 18-25
So, I want to load this txt file and build a matrix. I was wondering if someone could help me with this. I used fopen function but it gives me a single array not a matrix with 3 columns.
MATLAB has an interactive data importer. Just type uiimport in the command window. It allows you to:
Name the variable based on heading as shown in your example. You can also manually change them.
Specify variable (column) type, e.g. numeric, cell array of strings, etc.
Auto generate an import script for next use (if desired)
If it works for you then congratulations, you don't need to waste hours to write an data import script :)
Function fopen only returns the file ID and not the content of the file. Once you open the file you use the file ID to read line by line and then parse each line with strsplit using space as the delimiter.
Here's one simple way of doing so:
fid = fopen('textfile.txt');
tline = fgetl(fid);
n = 1;
while ischar(tline)
data(n,:) = strsplit(tline(1:end-1),' ');
n=n+1;
tline = fgetl(fid);
end
fclose(fid);
Keep in mind that the matrix data is type string and not numeric, so if you want to use the numeric values of your dataset, you'll need to take a look at the functions str2num (str2double in newer versions) and strtok to split the AgeScale strings with delimiter '-'.
I have decided to use memmapfile because my data (typically 30Gb to 60Gb) is too big to fit in a computer's memory.
My data files consist two columns of data that correspond to the outputs of two sensors and I have them in both .bin and .txt formats.
m=memmapfile('G:\E-Stress Research\Data\2013-12-18\LD101_3\EPS/LD101_3.bin','format','int32')
m.data(1)
I used the above code to memory map my data to a variable "m" but I have no idea what data format to use (int8', 'int16', 'int32', 'int64','uint8', 'uint16', 'uint32', 'uint64', 'single', and 'double').
In fact I tried all of the data formats listed that MATLAB supports, but when I used the m.data(index number) I never get a pair of numbers (2 columns of data) which is what I expected, also the number will be different depending on the format I used.
If anyone has experience with memmapfile please help me.
Here are some smaller versions of my data files so people can understand how my data is structured:
cheers
James
memmapfile is designed for reading binary files, that's why you are having trouble with your text file. The data in there is characters, so you'll have to read them as characters and then parse them into numbers. More on that below.
The binary file appears to contain more than just a stream of floating point values written in binary format. I see identifiers (strings) and other things in the file as well. Your only hope of reading that is to contact the manufacturer of the device that created the binary file and ask them about how to read in such files. There'll probably be an SDK, or at least a description of the format. You might want to look into this as the floating point numbers in your text file might be truncated, i.e., you have lost precision compared to directly reading the binary representation of the floats.
Ok, so how to read your file with memmapfile? This post provides some hints.
So first we open your file as 'uint8' (note there is no 'char' option, so as a workaround we read the content of the file into a datatype of the same size):
m = memmapfile('RTL5_57.txt','Format','uint8'); % uint8 is default, you could leave that off
We can render the data read in as uint8 as characters by casting it to char:
c = char(m.Data(1:19)).' % read the first three lines. NB: transpose just for getting nice output, don't use it in your code
c =
0.398516 0.063440
0.399611 0.063284
0.398985 0.061253
As each line in your file has the same length (2*8 chars for the numbers, 1 tab and 2 chars for newline = 19 chars), we can read N lines from the file by reading N*19 values. So m.Data(1:19) gets you the first line, m.Data(20:38), the second line, and m.Data(20:57) the second and third lines. Read as much as you want at once.
Then we'll have to parse the read-in data into floating point numbers:
f = sscanf(c,'%f')
f =
0.3985
0.0634
0.3996
0.0633
0.3990
0.0613
All that's left now is to reshape them into your two column format
d = reshape(f,2,[]).'
d =
0.3985 0.0634
0.3996 0.0633
0.3990 0.0613
Easier ways than using memmapfile:
You don't need to use memmapfile to solve your problem, and I think it makes things more complicated. You can simply use fopen followed by fread:
fid = fopen('RTL5_57.txt');
c = fread(fid,Nlines*19,'*char');
% now sscanf and reshape as above
% NB: one can read the values the text file directly with f = fscanf(fid,'%f',Nlines*19).
% However, in testing, I have found calling fread followed by sscanf to be faster
% which will make a significant difference when reading such large files.
Using this you can read Nlines pairs of values at a time, process them and simply call fread again to read the next Nlines. fread remembers where it is in the file (as does fscanf), so simply use same call to get next lines. Its thus easy to write a loop to process the whole file, testing with feof(fid) if you are at the end of the file.
An even easier way is suggested here: use textscan. To slightly adapt their example code:
Nlines = 10000;
% describe the format of the data
% for more information, see the textscan reference page
format = '%f\t%f';
fid = fopen('RTL5_57.txt');
while ~feof(fid)
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point if you need the memory!
% process d
end
fclose(fid);
Note again however that the fread followed by sscanf will be fastest. Note however that the fread method would die as soon as there is one line in the text file that doesn't exactly match your format. textscan is forgiving of whitespace changes on the other hand and thus more robust.
I have 20 text files each containing a vector of size 180. How can I access each text file and assign the vector to a variable?
If you want to do it manually use Import Data under File (or use uiimport).
If you want to automate it use fid = fopen(filename) and then use var = textscan(fid, 'format') where format depends on how your vectors are structured. Spend some time reading doc textscan and doc fopen everything you probably need to know is in those two files. If your data is nicely structured look at doc importdata.