How to read text fields into MATLAB and create a single matrix - matlab

I have a huge CSV file that has a mix of numerical and text datatypes. I want to read this into a single matrix in Matlab. I'll use a simpler example here to illustrate my problem. Let's say I have this CSV file:
1,foo
2,bar
I am trying to read this into MatLab using:
A=fopen('filename.csv');
B=textscan(A,'%d %d', 'delimiter',',');
C=cell2mat(B);
The first two lines work fine, but the problem is that texscan doesn't create a 2x2 matrix; instead it creates a 1x2 matrix with each value being an array. So I try to use the last line to combine the arrays into one big matrix, but it generates an error because the arrays have different datatypes.
Is there a way to get around this problem? Or a better way to combine the arrays?

I am note sure if combining them is a good idea. It is likely that you would be better off with them separate.
I changed your code, so that it works better:
clear
clc
A=fopen('filename.csv');
B=textscan(A,'%d %s', 'delimiter',',')
fclose(A)
Looking at the results
K>> B{1}
ans =
1
2
K>> B{2}
ans =
'foo'
'bar'
Really, I think this is the format that is most useful. If anything, most people would want to break this cell array into smaller chunks
num = B{1}
txt = B{2}
Why are your trying to combine them? They are already together in a cell array, and that is the most combined you are going to get.

There is a natural solution to this, but it requires the Statistics toolbox (version 6.0 or higher). Mixed data types can be read into a dataset array. See the Mathworks help page here.

I believe you can't use textscan for this purpose. I'd use fscanf which always gives you a matrix as specified. If you don't know the layout of the data it gets kind of tricky however.
fscanf works as follows:
fscanf(fid, format, size)
where fid is the fid generated by the fopen
format is the file format & how you are reading the data (['%d' ',' '%s'] would work for your example file)
size is the matrix dimensions ([2 2] would work on your example file).

Related

Is there a way to stack multiple 2x2 matrices in MATLAB into a multidimensional array in a simpler way (e.g. without using "cat" or "reshape")?

I am receiving a text file with 1000 matrices of size 2x2 each day from someone, in the following format (only 3 matrices are shown here instead of 1000):
0.96875000 0.03125000
0.03125000 0.96875000
0.96875000 0.01562500
0.03125000 0.98437500
0.99218800 0.03125000
0.00781250 0.96875000
I need to make a 2x2x1000 array in MATLAB. Ideally I could do something simple like:
[0.96875000 0.03125000
0.03125000 0.96875000;
0.96875000 0.01562500
0.03125000 0.98437500;
0.99218800 0.03125000
0.00781250 0.96875000]
After reading the MATLAB documentation on multidimensional arrays and the MATLAB documentation for the cat function, I figured out that I could make the required array in the following way (the first argument of cat is 3 because I'm concatenating the 2x2 matrices along the 3rd dimension):
cat(3,...
[0.96875000 0.03125000
0.03125000 0.96875000],...
[0.96875000 0.01562500
0.03125000 0.98437500],...
[0.99218800 0.03125000
0.00781250 0.96875000])
But that does not work if I put spacing between the lines as in my "ideal" example above, and the need for all the commas and dots makes it a bit uglier in my opinion.
While writing this question, I have discovered that I can run my "ideal" example and then use reshape, which I prefer over my solution using the cat function. For this, I don't even need the semi-colons. However Cris Luengo correctly pointed out in the comments that reshape is not enough and permute is also needed, and then Luis Mendo pointed out in chat that the solution is not so simple:
permute(reshape(ideal.',2,2,[]),[2 1 3])
Andras Deak has done what we thought was impossible, which is to remove the transpose, but the solution is still quite complicated, and was not easy to engineer:
permute(reshape(ideal,2,[],2),[1 3 2])
Ideally one would not need to use cat or reshape to make a 3D array, when the original data is already so nicely formatted in what the human eye can already see is a 3D array of several 2x2 matrices.
Is there a simpler way to build the 3D array in MATLAB using the data in the format I have?
So far I have done the following on my own:
Searched online and found the above two MATLAB documentation articles which lead me to the above solution using cat
Came up with the above solution using reshape while writing this question, then it got improved by Cris and Luis in the comments and chat 😊.
Also: I tried saving the data in a .txt file and clicked import in MATLAB, knowing that the import GUI gives some options for how the data is to be organized in the resulting MATLAB array, but there did not seem to be any option to make this a 3D array.
Indeed there is no "direct" way to import this text as a 3D matrix. This is the easiest way I can come up with:
Save the input as a .txt file
Use the import tool (Import Data button in the Variable toolbar) to import the data as a Mx2 matrix. Choose "Numeric Matrix" as "Output Type". And you can "exclude rows with" "blank cells" to avoid the empty rows.
Besides reshape() and permute(), using cell array to format it as below might be more intuitive and less error prone to someones.
% The number of 2x2 matrices
N = size(m,1)/2;
% Split each 2x2 matrix into a cell
c = mat2cell(m, 2*ones(1,N), 1);
% Concatenate along the 3rd dimension
output3DMatrix = cat(3, c{:});

How to read a complex 3D matrix (binary file) in Matlab without using interleaved/reshaping method?

I have a very huge 3D matrix, the data was written into disk for future use. Writing the matrix into a bin is easy, reading it back however have some issue.
Write to bin:
z=repmat(complex(rand(5),rand(5)),[1 1 5])
z_imag = imag(z);
z_real = real(z);
adjacent = [z_real z_imag];
fileID = fopen('complex.bin','w');
fwrite(fileID,adjacent,'double')
And now, I try to read it back using memmapfile:
m = memmapfile('complex.bin', 'Offset', 0, 'Format', {'double' [5,5,5] 'x'});
complexValues = complex(m.Data(:).x(1,:), m.Data(:).x(2,:)); %this line doesn't work though, just for explanation's sake
It gave me an error saying that
Error using memmapfile/subsref (line 764) A subscripting operation on
the Data field attempted to create a comma-separated list. The
memmapfile class does not support the use of comma-separated lists
when subscripting.
I was referring to the solution here, the suggested solution used the reshape to shape the matrix beforehand (as contrast to my method above). I try to avoid using reshape in my code as I'm dealing with very huge data and that might computationally expensive and takes a long time. Is there an alternative/better way to do this?
Thanks in advance!

Simplest way to read space delimited text file matlab

Ok, so I'm struggling with the most mundane of things I have a space delimited text file with a header in the first row and a row per observation and I'd like to open that file in matlab. If I do this in R I have no problem at all, it'll create the most basic matrix and voila!
But MATLAB seems to be annoying with this...
Example of the text file:
"picFile" "subjCode" "gender"
"train_1" 504 "m"
etc.
Can I get something like a matrix at all? I would then like to have MATLAB pull out some data by doing data(1,2) for example.
What would be the simplest way to do this?
It seems like having to write a loop using f-type functions is just a waste of time...
If you have a sufficiently new version of Matlab (R2013b+, I believe), you can use readtable, which is very much like how R does it:
T = readtable('data.txt','Delimiter',' ')
There are many functions for manipulating tables and converting back and forth between them and other data types such as cell arrays.
There are some other options in the data import and export section of the Statistics toolbox that should work in older versions of Matlab:
tblread: output in terms of separate variables for strings and numbers
caseread: output in terms of a char array
tdfread: output in terms of a struct
Alternatively, textscan should be able to accomplish what you need and probably will be the fastest:
fid = fopen('data.txt');
header = textscan(fid,'%s',3); % Optionally save header names
C = textscan(fid,'%s%d%s','HeaderLines',1); % Read data skipping header
fclose(fid); % Don't forget to close file
C{:}
Found a way to solve my problem.
Because I don't have the latest version of MATLAB and cannot use readable which would be the preferred option I ended up doing using textread and specifying the format of each column.
Tedious but maybe the "simplest" way I could find:
[picFile subCode gender]=textread('data.txt', '%s %f %s', 'headerlines',1);
T=[picFile(:) subCode(:) gender(:)]
The textscan solution by #horchler seems pretty similar. Thanks!

Inner matrix dimensions must agree error

I have a piece of code in which i save array values to a .txt file and then in another function i have to retrieve those values from .txt to an array...the code looks somewhat like this...
fid = fopen('c:\\coeffs2.txt','wt');
fprintf(fid,'%f\n',descr2);
fclose(fid);
And in another file i retrieve it this way..
fid = fopen('c:\\coeffs2.txt');
des2= [];
des2 = fscanf(fid,'%f\n');
fclose(fid);
i get the error as inner matrix dimension must agree...please help!
Are you sure these lines are the ones generating that error? Exactly what is the line where the error occurs? Normally this would happen if you did (for example) a matrix multiplication (*) when you intended to do element-by-element multiplication (.*) with a non square matrix...
You can use save('c:\\coeffs2.mat', 'descr2'); and load('c:\\coeffs2.mat'); as an alternative (and more efficient) way to store / retrieve the matrix, and be sure you didn't change the dimensions.
Did you try to see what size(descr2) gives before the save, and after retrieving? Maybe you just need a resize...

Fastest way to import CSV files in MATLAB

I've written a script that saves its output to a CSV file for later reference, but the second script for importing the data takes an ungainly amount of time to read it back in.
The data is in the following format:
Item1,val1,val2,val3
Item2,val4,val5,val6,val7
Item3,val8,val9
where the headers are on the left-most column, and the data values take up the remainder of the row. One major difficulty is that the arrays of data values can be different lengths for each test item. I'd save it as a structure, but I need to be able to edit it outside the MATLAB environment, since sometimes I have to delete rows of bad data on a computer that doesn't have MATLAB installed. So really, part one of my question is: Should I save the data in a different format?
Second part of the question:
I've tried importdata, csvread, and dlmread, but I'm not sure which is best, or if there's a better solution. Right now I'm using my own script using a loop and fgetl, which is horribly slow for large files. Any suggestions?
function [data,headers]=csvreader(filename); %V1_1
fid=fopen(filename,'r');
data={};
headers={};
count=1;
while 1
textline=fgetl(fid);
if ~ischar(textline), break, end
nextchar=textline(1);
idx=1;
while nextchar~=','
headers{count}(idx)=textline(1);
idx=idx+1;
textline(1)=[];
nextchar=textline(1);
end
textline(1)=[];
data{count}=str2num(textline);
count=count+1;
end
fclose(fid);
(I know this is probably terribly written code - I'm an engineer, not a programmer, please don't yell at me - any suggestions for improvement would be welcome, though.)
It would probably make the data easier to read if you could pad the file with NaN values when your first script creates it:
Item1,1,2,3,NaN
Item2,4,5,6,7
Item3,8,9,NaN,NaN
or you could even just print empty fields:
Item1,1,2,3,
Item2,4,5,6,7
Item3,8,9,,
Of course, in order to pad properly you would need to know what the maximum number of values across all the items is before hand. With either format above, you could then use one of the standard file reading functions, like TEXTSCAN for example:
>> fid = fopen('uneven_data.txt','rt');
>> C = textscan(fid,'%s %f %f %f %f','Delimiter',',','CollectOutput',1);
>> fclose(fid);
>> C{1}
ans =
'Item1'
'Item2'
'Item3'
>> C{2}
ans =
1 2 3 NaN %# TEXTSCAN sets empty fields to NaN anyway
4 5 6 7
8 9 NaN NaN
Instead of parsing the string textline one character at a time. You could use strtok to break the string up for example
stringParts = {};
tline = fgetl(fid);
if ~ischar(tline), break, end
i=1;
while 1
[stringParts{i},r]=strtok(tline,',');
tline=r;
i=i+1;
if isempty(r), break; end
end
% store the header
headers{count} = stringParts{1};
% convert the data into numbers
for j=2:length(stringParts)
data{count}(j-1) = str2double(stringParts{j});
end
count=count+1;
I've had the same problem with reading csv data in Matlab, and I was surprised by how little support there is for this, but then I just found the import data tool. I'm in r2015b.
On the top bar in the "Home" tab, click on "Import Data" and choose the file you'd like to read. An app window will come up like this:
Import Data tool screenshot
Under "Import Selection" you have the option to "generate function", which gives you quite a bit of customization options, including how to fill empty cells, and what you'd like the output data structure to be. Plus it's written by MathWorks, so it's probably utilizing the fastest available method to read csv files. It was almost instantaneous on my file.
Q1) If you know the max number of columns you can fill empty entries with NaN
Also, if all values are numerical, do you really need "Item#" column? If yes, you can use only "#", so all data is numerical.
Q2) The fastest way to read num. data from a file without mex-files is csvread.
I try to avoid using strings in csv files, but if I have to, I use my csv2cell function:
http://www.mathworks.com/matlabcentral/fileexchange/20135-csv2cell