Importing matrix with different row length - matlab

I have large data files and I would like to import 12 columns of data for further use. However the row length will be different in each instance. I would import the selected columns only but below the data I need are some blank rows followed by extra numbers which aren't necessary, so I'm wondering how to import just the data I need? I don't mind specifying and end row but this would be different for each case and I'm not sure if I'm missing anything else obvious! To help I've attached a print-screen of an example of the data I'm working with:
To summarise I only require the "blue" data above the purple boxes, each file I will use will have the same layout except there may be more/less rows of data.

EDIT
I have updated the code to give you a better understanding of the process:
% An empty array:
importedarray = [];
% Open the data file to read from it:
fid = fopen( 'dummydata.txt', 'r' );
% Check that fid is not -1 (error openning the file).
% read each line of the data in a cell array:
cac = textscan( fid, '%s', 'Delimiter', '\n' );
% size(cac{1},1) must equals the # of rows in your data file.
totalRows = size(cac{1},1);
fprintf('Imported %d rows of data!\n',totalRows)
% Close the file as we don't need it anymore:
fclose( fid );
% for total rows in data
for k=1:totalRows
fprintf('Parsing data on row %d of %d...\n',k,totalRows);
currentRow = cac{1}{k,1};
fprintf('Row contains:\n%s\n',currentRow);
% finish (break from loop) when encounter an empty row:
if isempty(currentRow)
fprintf('Empty row encountered (#%d). Exiting the loop...\n',k);
break;
end
eachRowElement = strsplit(currentRow, ' ');
fprintf('Splitting row to %d elements...\n',length(eachRowElement));
fprintf('Converting row to floats...');
eachRowElement2num = cellfun(#str2num,eachRowElement,'UniformOutput',false);
fprintf('Done!\n');
fprintf('Converting cell to matrix...');
importedarray(k,:) = cell2mat(eachRowElement2num);
fprintf('Done!\n');
end
clearvars cac k fid totalRows currentRow eachRowElement eachRowElement2num;
Given your example image (that all the columns of each row are filled with floats and on an empty row you stop) this should do the job giving info along the way. If not you will be able to tell what is the issue by looking the line the code stopped. I include code to eliminate the unnecessary variables after importing. This must be done manually or you can create a function to perform the task (functions' work space is different the the temporary variables are deleted on function return, see: http://www.mathworks.com/help/matlab/ref/function.html). Hope this helps.
PS. In your example you keep 12 columns skipping the first two. The above code will import the whole row. You can choose what columns to keep later by using matrix indexing, like:
importedarray = importedarray(:,3:14);
if these columns don't change you can incorporate this into your function.

Related

String vector to array

I am trying to make a script in Matlab that pulls data from a file and generates an array of data. Since the data is a string I've tried to split it into columns, take the transpose, and split it into columns again to populate an array.
When I run the script I don't get any errors, but I also don't get any useful data. I tell it to display the final vector (Full_Array) and I get {1×4 cell} 8 times. When I try to use strsplit I get the error:
'Error using strsplit (line 80) First input must be either a character vector or a string scalar.'
I'm pretty new to Matlab and I honestly have no clue how to fix it after reading through similar threads and the documentation I'm out of ideas. I've attached the code and the data to read in below. Thank you.
clear
File_Name = uigetfile; %Brings up windows file browser to locate .xyz file
Open_File = fopen(File_Name); %Opens the file given by File_Name
File2Vector = fscanf(Open_File,'%s'); %Prints the contents of the file to a 1xN vector
Vector2ColumnArray = strsplit(File2Vector,';'); %Splits the string vector from
%File2Vector into columns, forming an array
Transpose = transpose(Vector2ColumnArray); %Takes the transpose of Vector2ColumnArray
%making a column array into a row array
FullArray = regexp(Transpose, ',', 'split');
The data I am trying to read in comes from a .xyz file that I have titled methylformate.xyz, here is the data:
O2,-0.23799,0.65588,-0.69492;
O1,0.50665,0.83915,1.47685;
C2,-0.32101,2.08033,-0.75096;
C1,0.19676,0.17984,0.49796;
H4,0.66596,2.52843,-0.59862;
H3,-0.67826,2.36025,-1.74587;
H2,-1.03479,2.45249,-0.00927;
H1,0.23043,-0.91981,0.45346;
When I started using Matlab I also had problems with the data structure. The last line
FullArray = regexp(Transpose, ',', 'split');
splits each line and stores it in a cell array. In order to access the individual strings you have to index with curly brackets into FullArray:
FullArray{1}{1} % -> 'O2'
FullArray{1}{2} % -> '-0.23799'
FullArray{2}{1} % -> 'O1'
FullArray{2}{2} % -> '0.50665'
Thereby the first number corresponds to the row and the second to the particular element in the row.
However, there are easier functions in Matlab which load text files based on regular expressions.
Usually, the easiest function for reading mixed data is readtable.
data = readtable('methylformate.txt');
However, in your case this is more complex because
readtable can't cope with .xyz files, so you'd have to copy to .txt
The semi-colons confuse the read and make the last column characters
You can loop through each row and use textscan like so:
fid = fopen('methylformate.xyz');
tline = fgetl(fid);
myoutput = cell(0,4);
while ischar(tline)
myoutput(end+1,:) = textscan(tline, '%s %f %f %f %*[^\n]', 'delim', ',');
tline = fgetl(fid);
end
fclose(fid);
Output is a cell array of strings or doubles (as appropriate).

Importing a txt file into Matlab and reading the data

I need to import a .txt datafile into Matlab. The file has been made into 3 columns. Each column has specific numbers for a given variable. The script code must be able to do the following,
Requirement
1) import the data from txt into Matlab
2) Matlab should remove the values from the columns if the values are out of a certain range
3) Matlab should tell which line and what type of error.
My Approach
I have tried using the following approach,
function data = insertData(filename)
filename = input('Insert the name of the file: ', 's');
data = load(filename);
Column1 = data(:,1);
Column2 = data(:,2);
Column3 = data(:,3);
%Ranges for each column
nclm1 = Column1(Column1>0);
nclm2 = Column2(Column2 >= 10 & Column2 <= 100);
nclm3 = Column3(Column3>0);
%Final new data columns within the ranges
final = [nclm1, nclm2, nclm3];
end
Problem
The above code has the following problems:
1) Matlab is not saving the imported data as 'data' after the user inserts the name of the file. Hence I don't know why my code is wrong.
filename =input('Insert the name of the file: ', 's');
data = load(filename);
2) The columns in the end do not have the same dimensions because I can see that Matlab removes values from the columns independently. Therefore is there a way in which I can make Matlab remove values/rows from a matrix rather than the three 'vectors', given a range.
1) Not sure what you mean by this. I created a sample text file and Matlab imports the data as data just fine. However, you are only returning the original unfiltered data so maybe that is what you mean??? I modified it to return the original data and the filtered data.
2) You need to or the bad indices together so that they are removed from each column like this. Note I made some other edits ... see comments in the code below:
function [origData, filteredData]= insertData(filename)
% You pass in filename then overwrite it ...
% Modified to only prompt if not passed in.
if ~exist('filename','var') || isempty(filename)
filename = input('Insert the name of the file: ', 's');
end
origData = load(filename);
% Ranges check for each column
% Note: return these if you want to know what data was filter for
% which reason
badIdx1 = origData(:,1) > 0;
badIdx2 = origData(:,2) >= 10 & origData(:,2) <= 100;
badIdx3 = origData(:,3)>0;
totalBad = badIdx1 | badIdx2 | badIdx3;
%Final new data columns within the ranges
filteredData = origData(~totalBad,:);
end
Note: you mentioned you want to know which line for which type of error. That information is now contained in badIDx1,2, 3. So you can return them, print a message to the screen, or whatever you need to display that information.

Read a text file and separate the data into different columns and different tables in MATLAB

I have a huge text file that needs to be read and processed in MATLAB. This file at some points contain text to indicate that a new data series has started.
I have searched here but cant find any simple solution.
So what I want to do is to read the data in the file, put the data in a table in three different columns and when it finds text it should create a new table. It should repeat this process until the entire document is scanned.
This is how the document looks like:
time V(A,B) I(R1)
Step Information: X=1 (Run: 1/11)
0.000000000000000e+000 -2.680148e-016 0.000000e+00
9.843925313007988e-012 -4.753470e-006 2.216314e-011
1.000052605772457e-011 -4.835427e-006 2.552497e-011
1.031372754715773e-011 -4.999340e-006 -3.042096e-012
1.094013052602406e-011 -5.327165e-006 -1.206968e-011
Step Information: X=1 (Run: 2/11)
0.000000000000000e+000 -2.680148e-016 0.000000e+000
9.843925313007988e-012 -4.753470e-006 2.216314e-011
1.000052605772457e-011 -4.835427e-006 2.552497e-011
1.031372754715773e-011 -4.999340e-006 -3.042096e-012
1.094013052602406e-011 -5.327165e-006 -1.206968e-011
A rather crude approach is to read the file line by line and check if the line consists of three numbers. If it does, then append this to a temporary matrix. When you finally get to a line that doesn't contain three numbers, append this matrix as an element in a cell array, clear the temporary matrix and continue.
Something like this would work, assuming that the file is stored in 'file.txt':
%// Open the file
f = fopen('file.txt', 'r');
%// Initialize empty cell array
data = {};
%// Initialize temporary matrix
temp = [];
%// Loop over the file...
while true
%// Get a line from the file
line = fgetl(f);
%// If we reach the end of the file, get out
if line == -1
%// Last check before we break
%// Check if the temporary matrix isn't empty and add
if ~isempty(temp)
data = [data; temp];
end
break;
end
%// Else, check to see if this line contains three numbers
numbers = textscan(line, '%f %f %f');
%// If this line doesn't consist of three numbers...
if all(cellfun(#isempty, numbers))
%// If the temporary matrix is empty, skip
if isempty(temp)
continue;
end
%// Concatenate to cell array
data = [data; temp];
%// Reset temporary matrix
temp = [];
%// If this does, then create a row vector and concatenate
else
temp = [temp; numbers{:}];
end
end
%// Close the file
fclose(f);
The code is pretty self-explanatory but let's go into it to be sure you know what's going on. First open up the file with fopen to get a "pointer" to the file, then initialize our cell array that will contain our matrices as well as the temporary matrix used when reading in matrices in between header information. After we simply loop over each line of the file and we can grab a line with fgetl using the file pointer we created. We then check to see if we have reached the end of the file and if we have, let's check to see if the temporary matrix has any numerical data in it. If it does, add this into our cell array then finally get out of the loop. We use fclose to close up the file and clean things up.
Now the heart of the operation is what follows after this check. We use textscan and search for three numbers separated by spaces. That's done with the '%f %f %f' format specifier. This should give you a cell array of three elements if you are successful with numbers. If this is correct, then convert this cell array of elements into a row of numbers and concatenate this into the temporary matrix. Doing temp = [temp; numbers{:}]; facilitates this concatenation. Simply put I piece together each number and concatenate them horizontally to create a single row of numbers. I then take this row and concatenate this as another row in the temporary matrix.
Should we finally get to a line where it's all text, this will give you all three elements in the cell array found by textscan to be empty. That's the purpose of the all and cellfun call. We search each element in the cell and see if it's empty. If every element is empty, this is a line that is text. If this situation arises, simply take the temporary matrix and add this as a new entry into your cell array. You'd then reset the temporary matrix and start the logic over again.
However, we also have to take into account that there may be multiple lines that consist of text. That's what the additional if statement is for inside the first if block using all. If we have an additional line of text that precedes a previous line of text, the temporary matrix of values should still be empty and so you should check to see if that is empty before you try and concatenate the temporary matrix. If it's empty, don't bother and just continue.
After running this code, I get the following for my data matrix:
>> format long g
>> celldisp(data)
data{1} =
0 -2.680148e-16 0
9.84392531300799e-12 -4.75347e-06 2.216314e-11
1.00005260577246e-11 -4.835427e-06 2.552497e-11
1.03137275471577e-11 -4.99934e-06 -3.042096e-12
1.09401305260241e-11 -5.327165e-06 -1.206968e-11
data{2} =
0 -2.680148e-16 0
9.84392531300799e-12 -4.75347e-06 2.216314e-11
1.00005260577246e-11 -4.835427e-06 2.552497e-11
1.03137275471577e-11 -4.99934e-06 -3.042096e-12
1.09401305260241e-11 -5.327165e-06 -1.206968e-11
To access a particular "table", do data{ii} where ii is the table you want to access that was read in from top to bottom in your text file.
The most versatile way is to read line by line using textscan. If you want to speed this process up, you can have a dummy read first:
ie. You loop through all the lines without storing the data and decide which lines are the text lines and which are numbers, recording a quick number of lines for each.
You then have enough information about the data to run through quickly the arrays. This will speed up the time it takes to store the data in your new arrays massively.
Your second loop is the one that actually reads the data into the array/s. You should now know which lines to skip. You can also pre-allocate the arrays within the data cell if you wish to.
fid = fopen('file.txt','r');
data = {};
nlines = [];
% now start the loop
k=0; % counter for data sets
while ~feof(fid)
line = fgetl(fid);
% check if is data or text
if all(ismember(line,' 0123456789+.')) % is it data
nlines(k) = nlines(k)+1;
else %is it text
k=k+1;
nlines(k) = 0;
end
end
frewind(fid); % go back to start of file
% You could preallocate the data array here if you wished
% now get the data
for aa = 1 : length(nlines)
if nlines(aa)==0;
continue
end
textscan(fid,'%s\r\n',1); % skip textline
data{aa} = textscan(fid,'%f%f%f\r\n',nlines(k));
end

Create a 2 column matrix with 2 different format types

very very new to Matlab and I'm having trouble reading a binary file into a matrix. The problem is I am trying to write the binary file into a two column matrix (which has 100000's of rows) where each column is a different format type.
I want column 1 to be in 'int8' format and column 2 to be a 'float'
This is my attempt so far:
FileID= fopen ('MyBinaryFile.example');
[A,count] = fread(FileID,[nrows, 2],['int8','float'])
This is not working because I get the error message 'Error using fread' 'Invalid Precision'
I will then go on to plot once I have successfully done this.
Probably a very easy solution to someone with matlab experience but I haven't been successful at finding a solution on the internet.
Thanks in advance to anyone who can help.
You should be aware that Matlab cannot hold different data type in a matrix (it can do so in a cell array but this is another topic). So there is no point trying to read your mixed type file in one go in one single matrix ... it is not possible.
Unless you want a cell array, you will have to use 2 different variables for your 2 columns of different type. Once this is established, there are many ways to read such a file.
For the purpose of the example, I had to create a binary file as you described. This is done this way:
%% // write example file
A = int8(-5:5) ; %// a few "int8" data
B = single(linspace(-3,1,11)) ; %// a few "float" (=single) data
fileID = fopen('testmixeddata.bin','w');
for il=1:11
fwrite(fileID,A(il),'int8');
fwrite(fileID,B(il),'single');
end
fclose(fileID);
This create a 2 column binary file, with first column: 11 values of type int8 going from -5 to +5, and second column: 11 values of type float going from -3 to 1.
In each of the solution below, the first column will be read in a variable called C, and the second column in a variable called D.
1) Read all data in one go - convert to proper type after
%% // Read all data in one go - convert to proper type after
fileID = fopen('testmixeddata.bin');
R = fread(fileID,'uint8=>uint8') ; %// read all values, most basic data type (unsigned 8 bit integer)
fclose(fileID);
R = reshape( R , 5 , [] ) ; %// reshape data into a matrix (5 is because 1+4byte=5 byte per column)
temp = R(1,:) ; %// extract data for first column into temporary variable (OPTIONAL)
C = typecast( temp , 'int8' ) ; %// convert into "int8"
temp = R(2:end,:) ; %// extract data for second column
D = typecast( temp(:) , 'single' ) ; %// convert into "single/float"
This is my favourite method. Specially for speed because it minimizes the read/seek operations on disk, and most post calculations are done in memory (much much faster than disk operations).
Note that the temporary variable I used was only for clarity/verbose, you can avoid it altogether if you get your indexing into the raw data right.
The key thing to understand is the use of the typecast function. And the good news is it got even faster since 2014b.
2) Read column by column (using "skipvalue") - 2 pass approach
%% // Read column by column (using "skipvalue") - 2 pass approach
col1size = 1 ; %// size of data in column 1 (in [byte])
col2size = 4 ; %// size of data in column 2 (in [byte])
fileID = fopen('testmixeddata.bin');
C = fread(fileID,'int8=>int8',col2size) ; %// read all "int8" values, skipping all "float"
fseek(fileID,col1size,'bof') ; %// rewind to beginning of column 2 at the top of the file
D = fread(fileID,'single=>single',col1size) ; %// read all "float" values, skipping all "int8"
fclose(fileID);
That works too. It works fine ... but probably much slower than above. Although it may be clearer code to read for someone else ... I find that ugly (and yet I've used this way for several years until I got to use the method above).
3) Read element by element
%% // Read element by element (slow - not recommended)
fileID = fopen('testmixeddata.bin');
C=[];D=[];
while ~feof(fileID)
try
C(end+1) = fread(fileID,1,'int8=>int8') ;
D(end+1) = fread(fileID,1,'single=>single') ;
catch
disp('reached End Of File')
end
end
fclose(fileID);
Talking about ugly code ... that does work too, and if you were writing C code it would be more than ok. But in Matlab ... please avoid ! (well, your choice ultimately)
Merging in one variable
If really you want all of that in one single variable, it could be a structure or a cell array. For a cell array (to keep matrix indexing style), simply use:
%% // Merge into one "cell array"
Data = { C , D } ;
Data =
[11x1 int8] [11x1 single]

Importing large amount of data into MATLAB?

I have a text file that is ~80MB. It has 2 cols and around 6e6 rows. I would like to import the data into MATLAB, but it is too much data to do with the load function. I have been playing around with the fopen function but cant get anything to work.
Ideally I would like to take the first col of data and import and eventually have it in one large array in MATLAB. If that isn't possible, I would like to split it into arrays of 34,013 in length. I would also like to do the same for the 2nd col of data.
fileID = fopen('yourfilename.txt');
formatSpec = '%f %f';
while ~feof(fileID)
C = textscan(fileID,formatSpec,34013);
end
Hope this helps..
Edit:
The reason you are getting error is because C has two columns. So you need to take the columns individually and handle them.
For example:
column1data = reshape(C(:,1),301,113);
column2data = reshape(C(:,2),301,113);
You may also consider to convert your file to binary format if your data file is not changing each time you want to load it. Then you'll load it way much faster.
Or you may do "transparent binary conversion" like in the function below. Only first time you load the data will be slow. All subsequent will be fast.
function Data = ReadTextFile(FileName,NColumns)
MatFileName = sprintf('%s.mat',FileName); % binary file name
if exist(MatFileName,'file')==2 % if it exists
S = load(MatFileName,'Data'); % load it instead of
Data = S.Data; % the original text file
return;
end
fh = fopen(FileName); % if binary file does not exist load data ftom the original text file
fh_closer = onCleanup( #() fclose(fh) ); % the file will be closed properly even in case of error
Data = fscanf(fh, repmat('%f ',1,NColumns), [NColumns,inf]);
Data = Data';
save(MatFileName,'Data'); % and make binary "chache" of the original data for faster subsequent reading
end
Do not forget to remove the MAT file when the original data file is changed.