Importing multiple files at once in matlab - matlab

thank you very much for reading this;
Problem: I have a question regarding importing in Matlab. I am trying to design a wind turbine. To do this I need to read many different text files containing the x and y coordinates defining the shape of the airfoil (all text files come in a folder) and analyse each one in turn.
My first try was
%% Reading the datafile
%Opening the filename in the format given to us by UIUC
fid=fopen('Filename',r);
%Skipping over the first lines (asterisk)
fscanf(fid,'%*s %*s %*s \n %*s %*s ',[1,5]);
%Reading over the Values of the columns
data=fscanf(fid,'%f %f \n',[2, inf]);
%Extracting the different types of data from the datafile
x=data(1,:);
y=data(2,:);
Such code has two problems:
The first problem is the fact that there is a variable number of words that i need to remove in the first column
Question 1: Is there a way to remove just the first column of data from the database or would I have to delete all the values?
The second problem is that I have 1550 tables to look at and changing the filename 1550 times will be very tedious and time-consuming.
Question 2: Is there a way of opening all the files from the folder, one at a time, in one go?

An easy way to read tabular files is to use readtable. For example, if you know your file has 2 header lines that you want to skip, then you can simply run
t = readtable('my_file', 'HeaderLines', 2);
This does, however, mean you are working with a table and not with a MATLAB array. If you prefer to keep an array, you can simply pass the table through table2array (example below).
In terms of dealing with lots of files, suppose your working directory has the following files:
a1.txt a3.txt b1.txt ...
a2.txt a4.txt b2.txt
You should first get an array that contains all the filenames and then iterate over this array. One way to do this would be as follows.
all_files = dir('*.txt') % Use a suitable glob to match your filenames
for i = 1:length(all_files)
data = table2array(readtable(all_files(i).name, 'HeaderLines', 2);
% do stuff
end
Alternatively, if you want to read the files manually:
all_files = dir('*.txt') % Use a suitable glob to match your filenames
for i = 1:length(all_files)
fid = fopen(all_files(i).name, 'r');
% do stuff
fclose(fid);
end
Note that the code assumes that all the data files have the same format.

Related

Read the data from specific line in MATLAB

I have a sequence of data files(".tab" files) with more than 11100 rows and 236 columns. Data begins from 297th line in one file and from 299th line in another file. How can I read the data from 297th row of each file in MATLAB R2014a?
I am not quite sure, bu it seems that a typical machine's memory can handle such a file size. In that case, you can use textscan or textread MATLAB built-in functions.
Nonetheless, if you really cannot import your data into MATLAB environment, set HeaderLines argument of textscan to the line of interest. A simple example can be found in MATLAB documentations, or:
SelectedData = textscan(ID,formatSpec,'HeaderLines',296); % Ignore 296 first lines of the data
First of all, I strongly recommend to review the MATLAB documentation. Assuming you have several files in hand (stored in fileNames:
for i = 1:numel(fileNames)
ID = fopen(fileNames{i});
formatSpec = '%s %[^\n]'; % Modify this based on your file structure
SelectedData{i} = textscan(ID,formatSpec,'HeaderLines',296);
fclose(ID);
end
SelectedData is a cell string containing all your data extracted from corresponding data (fileNames)

MATLAB: How can I efficiently read in these data files?

I have 100 data files in a folder called "Experiment1", and I need to take all of the data from them and put them into a single matrix. Each data file contains 15 columns and 40 rows of data.
The order in which the files are in the folder is arbitrary. It doesn't matter in what order they get put into the combined matrix.
I've written some code using dlmread that will do the job:
for i = 1:100
%% Read in the relevant file.
filename = ['File_' int2str(i) '.dat']
Data = dlmread(fullfile(pwd, 'Experiment1',filename));
%% Put all the data I need in a separate matrix
NeededData(1+((i-1)*40):i+((i-1)*40)-i+40,1:15) = Data(:,1:15);
end
However, there are two things I don't like about my code.
The files have random names at present, and I'd need to manually change all their names to "File_1.dat", "File_2.dat", etc.
The code is cumbersome and hard to read.
How could I do things better?
Since you've fixed the problem of defining the name of the files to be read with dir, you can improve the way you add the read data (Data) to the output matrix (NeededData).
You can sumultaneously read the input files and add the data to the output matrix by inserting the call to dlmread directly in the assignment statement:
files=dir('*.dat');
n_files=length(files)
% Initialize the output matrix as empty
NeededData_0=[]
for i=1:n_files
% Simultaneously read input file and assign data to the output matrinx
NeededData_0=[NeededData_0;dlmread(files(i).name)]
end
In case you prefer working with the inides (as in your origina approach), since you know in advance that all the files have the same (40) number of rows) you can simplify the notation as follows:
files=dir('*.dat');
n_files=length(files)
% Define the number of rows in each inout file
n_rows=40;
% Define the number of colums in each inout file
n_col=15;
NeededData_2=nan(n_rows*n_files,n_col)
% Define the sequence of rows
r_list=1:n_rows:n_rows*n_files
for i=1:3
Data=dlmread(files(i).name)
NeededData_2(r_list(i):r_list(i)+n_rows-1,:)=Data
end
Hope this helps.
Using the suggestion to use dir present in the answers I have made the following code, which is clearly an improvement on my earlier effort. I would welcome further improvements, or suggestions for alternative approaches.
files = dir('*.dat');
for i = 1:length({files.name})
%% Read in the relevant file.
Data = dlmread(files(i).name);
%% Put all the data I need in a separate matrix
NeededData(1+((i-1)*40):i+((i-1)*40)-i+40,1:15) = Data(:,1:15);
end

Is there a way to only read part of a line of an input file?

I have a routine that opens a lookup table file to see if a certain entry already exists before writing to the file. Each line contains about 2,500 columns of data. I need to check the first 2 columns of each line to make sure the entry doesn't exist.
I don't want to have to read in 2,500 columns for every line just to check 2 entries. I was attempting to use the fscanf function, but it gives me an Invalid size error when I attempt to only read 2 columns. Is there a way to only read part of each line of an input file?
if(exist(strcat(fileDirectory,fileName),'file'))
fileID = fopen(strcat(fileDirectory,fileName),'r');
if(fileID == -1)
disp('ERROR: Could not open file.\n')
end
% Read file to see if line already exists
dataCheck = fscanf(fileID, '%f %f', [inf 2]);
for i=1:length(dataCheck(:,1))
if(dataCheck(i,1) == sawAnglesDeg(sawCount))
if(dataCheck(i,2) == sarjAnglesDeg(floor((sawCount-1)/4)+1))
% This line has already been written in lookup table
lineExists = true;
disp('Duplicate lookup table line found. Skipping...\n')
break;
end
end
end
fclose(fileID);
end
Well, not really.
You should be able in a loop to do an fscanf of the first two doubles, followed by a fgetl to read the rest of the line, i.e. on the form:
while there_are_more_lines
dataCheck = fscanf(fileID, '%f', 2);
fgetl(fileID); % Read remainder of line, discarding it
% Do check here for each line
end
Since it is a text file, you can not really skip reading characters from the file. For binary files you can do an fseek, which can jump round in the file based on a byte-count - it can be used if you know exactly where the next line starts (in byte-count). But for a text file you do not know that, since each line will vary in length. If you save the data in a binary file instead, it would be possible to do something like that.
What I would probably do: Create two files, the first one containing the two "check-values", that could be read in quickly, and the other one containing the 2500 columns of data, with or without the two "check-values". They should be updated synchronously; when adding one line to the first file, also one line is added to the second file.
And I would definitely make a checkData matrix variable and keep that in memory as long as possible; when adding a new line to the file, also update the checkData matrix, so you should only need to read the file once initially and use the checkData matrix for the rest of the life of your program.
With textscan you can skip fields, parts of fields, or even "rest of line", so I would do this (based on MATLAB help example slightly modified):
fileID = fopen('data.dat');
data = textscan(fileID,'%f %f %*[^\n]');
fclose(fileID);
Then check data (should be the two columns you want) to see if any of those rows matches the requirements.
As #Jesper Grooss wrote, there is no solution to skip the remaining of a line without reading it. In a single text file context, a fastest solution would probably consist of
reading the entire file with textscan (one line of text into one cell element of a matrix)
appending the new line to the matrix even if it is a duplicate entry
uniquing the cell matrix with unique(cellmatrix, 'rows')
appending the new line to the text file if it corresponds to a new entry
The uniquing step replaces the putatively costly for loop.

Memory map file in MATLAB?

I have decided to use memmapfile because my data (typically 30Gb to 60Gb) is too big to fit in a computer's memory.
My data files consist two columns of data that correspond to the outputs of two sensors and I have them in both .bin and .txt formats.
m=memmapfile('G:\E-Stress Research\Data\2013-12-18\LD101_3\EPS/LD101_3.bin','format','int32')
m.data(1)
I used the above code to memory map my data to a variable "m" but I have no idea what data format to use (int8', 'int16', 'int32', 'int64','uint8', 'uint16', 'uint32', 'uint64', 'single', and 'double').
In fact I tried all of the data formats listed that MATLAB supports, but when I used the m.data(index number) I never get a pair of numbers (2 columns of data) which is what I expected, also the number will be different depending on the format I used.
If anyone has experience with memmapfile please help me.
Here are some smaller versions of my data files so people can understand how my data is structured:
cheers
James
memmapfile is designed for reading binary files, that's why you are having trouble with your text file. The data in there is characters, so you'll have to read them as characters and then parse them into numbers. More on that below.
The binary file appears to contain more than just a stream of floating point values written in binary format. I see identifiers (strings) and other things in the file as well. Your only hope of reading that is to contact the manufacturer of the device that created the binary file and ask them about how to read in such files. There'll probably be an SDK, or at least a description of the format. You might want to look into this as the floating point numbers in your text file might be truncated, i.e., you have lost precision compared to directly reading the binary representation of the floats.
Ok, so how to read your file with memmapfile? This post provides some hints.
So first we open your file as 'uint8' (note there is no 'char' option, so as a workaround we read the content of the file into a datatype of the same size):
m = memmapfile('RTL5_57.txt','Format','uint8'); % uint8 is default, you could leave that off
We can render the data read in as uint8 as characters by casting it to char:
c = char(m.Data(1:19)).' % read the first three lines. NB: transpose just for getting nice output, don't use it in your code
c =
0.398516 0.063440
0.399611 0.063284
0.398985 0.061253
As each line in your file has the same length (2*8 chars for the numbers, 1 tab and 2 chars for newline = 19 chars), we can read N lines from the file by reading N*19 values. So m.Data(1:19) gets you the first line, m.Data(20:38), the second line, and m.Data(20:57) the second and third lines. Read as much as you want at once.
Then we'll have to parse the read-in data into floating point numbers:
f = sscanf(c,'%f')
f =
0.3985
0.0634
0.3996
0.0633
0.3990
0.0613
All that's left now is to reshape them into your two column format
d = reshape(f,2,[]).'
d =
0.3985 0.0634
0.3996 0.0633
0.3990 0.0613
Easier ways than using memmapfile:
You don't need to use memmapfile to solve your problem, and I think it makes things more complicated. You can simply use fopen followed by fread:
fid = fopen('RTL5_57.txt');
c = fread(fid,Nlines*19,'*char');
% now sscanf and reshape as above
% NB: one can read the values the text file directly with f = fscanf(fid,'%f',Nlines*19).
% However, in testing, I have found calling fread followed by sscanf to be faster
% which will make a significant difference when reading such large files.
Using this you can read Nlines pairs of values at a time, process them and simply call fread again to read the next Nlines. fread remembers where it is in the file (as does fscanf), so simply use same call to get next lines. Its thus easy to write a loop to process the whole file, testing with feof(fid) if you are at the end of the file.
An even easier way is suggested here: use textscan. To slightly adapt their example code:
Nlines = 10000;
% describe the format of the data
% for more information, see the textscan reference page
format = '%f\t%f';
fid = fopen('RTL5_57.txt');
while ~feof(fid)
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point if you need the memory!
% process d
end
fclose(fid);
Note again however that the fread followed by sscanf will be fastest. Note however that the fread method would die as soon as there is one line in the text file that doesn't exactly match your format. textscan is forgiving of whitespace changes on the other hand and thus more robust.

Fastest way to import CSV files in MATLAB

I've written a script that saves its output to a CSV file for later reference, but the second script for importing the data takes an ungainly amount of time to read it back in.
The data is in the following format:
Item1,val1,val2,val3
Item2,val4,val5,val6,val7
Item3,val8,val9
where the headers are on the left-most column, and the data values take up the remainder of the row. One major difficulty is that the arrays of data values can be different lengths for each test item. I'd save it as a structure, but I need to be able to edit it outside the MATLAB environment, since sometimes I have to delete rows of bad data on a computer that doesn't have MATLAB installed. So really, part one of my question is: Should I save the data in a different format?
Second part of the question:
I've tried importdata, csvread, and dlmread, but I'm not sure which is best, or if there's a better solution. Right now I'm using my own script using a loop and fgetl, which is horribly slow for large files. Any suggestions?
function [data,headers]=csvreader(filename); %V1_1
fid=fopen(filename,'r');
data={};
headers={};
count=1;
while 1
textline=fgetl(fid);
if ~ischar(textline), break, end
nextchar=textline(1);
idx=1;
while nextchar~=','
headers{count}(idx)=textline(1);
idx=idx+1;
textline(1)=[];
nextchar=textline(1);
end
textline(1)=[];
data{count}=str2num(textline);
count=count+1;
end
fclose(fid);
(I know this is probably terribly written code - I'm an engineer, not a programmer, please don't yell at me - any suggestions for improvement would be welcome, though.)
It would probably make the data easier to read if you could pad the file with NaN values when your first script creates it:
Item1,1,2,3,NaN
Item2,4,5,6,7
Item3,8,9,NaN,NaN
or you could even just print empty fields:
Item1,1,2,3,
Item2,4,5,6,7
Item3,8,9,,
Of course, in order to pad properly you would need to know what the maximum number of values across all the items is before hand. With either format above, you could then use one of the standard file reading functions, like TEXTSCAN for example:
>> fid = fopen('uneven_data.txt','rt');
>> C = textscan(fid,'%s %f %f %f %f','Delimiter',',','CollectOutput',1);
>> fclose(fid);
>> C{1}
ans =
'Item1'
'Item2'
'Item3'
>> C{2}
ans =
1 2 3 NaN %# TEXTSCAN sets empty fields to NaN anyway
4 5 6 7
8 9 NaN NaN
Instead of parsing the string textline one character at a time. You could use strtok to break the string up for example
stringParts = {};
tline = fgetl(fid);
if ~ischar(tline), break, end
i=1;
while 1
[stringParts{i},r]=strtok(tline,',');
tline=r;
i=i+1;
if isempty(r), break; end
end
% store the header
headers{count} = stringParts{1};
% convert the data into numbers
for j=2:length(stringParts)
data{count}(j-1) = str2double(stringParts{j});
end
count=count+1;
I've had the same problem with reading csv data in Matlab, and I was surprised by how little support there is for this, but then I just found the import data tool. I'm in r2015b.
On the top bar in the "Home" tab, click on "Import Data" and choose the file you'd like to read. An app window will come up like this:
Import Data tool screenshot
Under "Import Selection" you have the option to "generate function", which gives you quite a bit of customization options, including how to fill empty cells, and what you'd like the output data structure to be. Plus it's written by MathWorks, so it's probably utilizing the fastest available method to read csv files. It was almost instantaneous on my file.
Q1) If you know the max number of columns you can fill empty entries with NaN
Also, if all values are numerical, do you really need "Item#" column? If yes, you can use only "#", so all data is numerical.
Q2) The fastest way to read num. data from a file without mex-files is csvread.
I try to avoid using strings in csv files, but if I have to, I use my csv2cell function:
http://www.mathworks.com/matlabcentral/fileexchange/20135-csv2cell