Import csv files in MATLAB - matlab

I'm trying to import a csv file (7816 x 119) with a lot of tiny numbers (between 1.0E-11 and 1.0E-9) using the following code:
filename = 'dataset.csv';
D = importdata(filename,',',1);
When I check the import result I obtain
D=
data: [187x119 double]
textdata: {1x119 cell}
colheaders: {1x119 cell}
Note that the size of D is a lot smaller than the original data size.
When I do the same process with a matrix with bigger numbers (not scientific notation) I don't have any problem.
I'm wondering if MATLAB have a restriction of size that I can import in a csv file or restrictions with numbers in scientific notation?

As is suspected, your data is corrupted in some places. Search for 'DIV' in the file, you will find an entry '#DIV/0!' several times. Interestingly, this worked in some matlab version for me (i currently don't know the version number) as well as it works in octave with a current release.
Here the test:
D = csvread('data_matlab.csv', 1, 0);
gives
Error using dlmread (line 143)
Mismatch between file and format string.
Trouble reading 'Numeric' field from file (row number 187, field number 72) ==>
#DIV/0!,1.11E-08,0,9.28E-09,2.8E-09,0.000000031,1.99E-08,6.49E-10,1.75E-09,9.66E-09,8.47E-10,3.82E-09,2.41E-10,1.71E-09,5.48E-09,1.32E-09,8.73E-09,2.05E-09,8.89E-10,3.83E-10,0,1.36E-08,2.92E-09,3.08E-...
Error in csvread (line 47)
m=dlmread(filename, ',', r, c);
Where do you get the data from? Can you influence the output? If you can't replace the errorneous entries by hand (using an appropriate tool) or use #Trogdors answer.

I was able to replicate the problem. Using xlsread did not produce that issue:
filename = 'data_matlab.csv';
d = xlsread(filename);

Related

Reading numeric data from CSV file Matlab for specific columns?

I have two .csv files which I am trying to read into Matlab as numeric matrices. Call it list_a, simply has two columns of ID numbers and corresponding values (appr. 50000 lines) with a ',' delimiter. list_b has 6 columns with a ';' delimiter. I am only interested in the first two columns containing containing numbers; the other columns contain text that I don't care about.
I initially tried using the readtable function in Matlab but noticed that these values aren't stored as numeric values, which is a requirement I have. I couldn't figure out how to cast these as integers after reading them either.
For list_a I have used the dlmread function, which I believes reads the file as numeric values.
For list_b I have tried using the dlmread function in which row and column offsets can be specified (https://www.mathworks.com/help/matlab/ref/dlmread.html#d117e329603) - the problem here is however, that the length of the file could change in the future, so I'm not sure what to enter for the row offsets.
I'm also not sure I understand how this function works, considering I tried testing it for the first 1000 rows as follows:
csv_matrix = dlmread(csv_fullpath,';',[1 1 1000 2]);
and subsequently got the following error message - even though "field number 3" shouldn't even be included in the first place:
Error using dlmread (line 147)
Mismatch between file and format character vector.
Trouble reading 'Numeric' field from file (row number 1, field number 3) ==>
RandomTextInFile\n Error in Damage_List_Reader (line 15)
csv_matrix = dlmread(csv_fullpath,';',[1 1 1000 3]);
I get the impression that I'm making this problem a lot harder than it needs to be so if there's an all around better way to do this, I'm all ears.. Thanks!
I would suggest using fopen in combination with textscan (e.g. for list_a) like this:
file = fopen('list_a.csv');
out = textscan(file, '%d%f', 'delimiter', ',');
ID = out{1};
Vals = out{2};
'%d%f' specifies the FormatSpec, so the way how the data is formatted in file. With this, you can capture any data from a csv file (and also omit data). I recommend reading the textscan Matlab doc for further formatting issues.
P.S.: I think you can put and "end" (without the quotations) instead of one of the row offset values if the number of rows/cols isn't fixed.

Confused with .tsv files in MATLAB (converting to a Matrix?)

I have a .tsv file that I wish to open in MATLAB, however I am having several problems with this.
I have tried the following
fid = fopen('data.tsv');
C = textscan(fid, ['%s' repmat('%f',1,8)], 'HeaderLines', 1);
fclose(fid);
and got some weird values that had nothing to do with my file. I also tried:
data = dlmread('data.tsv', '\t');
and got this
Error using dlmread (line 139)
Mismatch between file and format string.
Trouble reading number from file (row 1u, field 1u) ==> Participant Assessment
Experiment Block Trial
Answer Reaction Timestamp Free Response\n
Is there some way I can get it to ignore the header, or am I doing it totally wrong?
With dlmread you can specify where to start reading in the file. This is one of the few times that MATLAB indexing begins at 0 - [0,0] is the first row, first column. Therefore, to ignore the first row (containing your header):
data = dlmread('data.tsv','\t', 1, 0);
This will only work if all the values (other than the header lines you skip) are numeric.
Your example with textscan also looks fine to me (provided that the format supplied is correct and there is indeed only one header line). C will be a cell array; to obtain the data from each column use C{n} where n is the column number.
Rather than skipping the header line, it's sometimes useful to just read it in to a separate value:
fid = fopen('data.tsv');
C_header = textscan(fid, '%s',9);
C = textscan(fid, ['%s' repmat('%f',1,8)]);
fclose(fid);

Error "Index exceeds matrix dimensions." in MatLab using importdata with a text file

I'm trying to import data using importdata and when I try to parse the returned data to create a matrix I get, "Index exceeds matrix dimensions". Below is my code...
traindata = importdata('textfile.txt');
%[A,delimiterOut,headerlinesOut] = importdata('textfile.txts');
disp(traindata); %everytime I run this code traindata increments by 1
X = traindata(' ',1:8); %this is where the error occurs, delimiter is 3 spaces
Y = traindata(' ',9);
Below is the format of the data in textfile.txt...
,,,5.4,,,0.0,,,0.0,,,1.6,,,2.5,,,1.0,,,6.7,,,2.8,,,6.1
,,,4.2,,,1.1,,,3.6,,,3.9,,,1.8,,,9.3,,,3.3,,,2.4,,,7.6
The data is delimited by spaces (I used commas to try and show the spaces between the data) and a newline at the end of each line. I've open textfile.txt in word and verified by viewing the hidden formatting characters. I've tried the code...
[A,delimiterOut,headerlinesOut] = importdata(inputfile);
to try to verify the delimiter used and I get the error, "Too many output arguments." As you can see I'm trying to create two matrices (X,Y) from the imported data. I've seen this specific error on stackoverflow but nothing regarding importdata. I've also tried dlmread and have not had luck. Thanks in advance for any help.
Tried the suggestion of importing the data using file->import data but I receive the error..
Error using importdata
Too many output arguments.
"Error in uiimport/runImportdata (line 433)
[datastruct, OTextDelimiter, OHeaderLines] = ...
Error in uiimport/gatherFilePreviewData (line 376)
[datastruct, textDelimiter, headerLines]= runImportdata(fileAbsolutePath, type);
Error in uiimport (line 194)
[ctorPreviewText, ctorHeaderLines, ctorDelim] = ..."
I'm starting to wondering if it's some sort of application bug. Here are some specifics..
"R2012a (7.0.14.739) 64 bit (Win64)". The encoding of the text file is utf-8. Thanks again for the help!
Looks like the array returned from importdata is a 1 element array.
train = importdata('textfile.txt');
fprintf('1st element in array %d\n', traindata(1)); % prints a number a number that increase each time I run this function ie 1,2,3,4...
fprintf('2nd element in array %d\n', traindata(2)); % produces error, "Index exceeds matrix dimensions"
I often find it useful to use matlab's built in GUI for importing a data file, which can help to visualised how the data will be imported. There is an option in here to produce the code required to replicate the options that were selected during the import which will allow you to work out how to dynamically import the data.
Just go to:
File >>> Import Data...

MATLAB uint8 data type variable save

does anyone know how to save an uint8 workspace variable to a txt file?
I tried using MATLAB save command:
save zipped.txt zipped -ascii
However, the command window displayed warning error:
Warning: Attempt to write an unsupported data type to an ASCII file.
Variable 'zipped' not written to file.
In order to write it, simply cast your values to double before writing it.
A=uint8([1 2 3])
toWrite=double(A)
save('test.txt','toWrite','-ASCII')
The reason uint8 can't be written is hidden in the format section of the save doc online, took myself a bit to find it.
The doc page is here: http://www.mathworks.com/help/matlab/ref/save.html
The 3rd line after the table in the format section (about halfway down the page) says:
Each variable must be a two-dimensional double or character array.
Alternatively, dlmwrite can write matrices of type uint8, as the other poster also mentioned, and I am sure the csv one will work too, but I haven't tested it myself.
Hopefully that will help you out, kinda annoying though! I think uint8 is used almost exclusively for images in MATLAB, but I am assuming writing the values as an image is not feasible in your situation.
have you considered other write-to-file options in Matlab?
How about dlmwrite?
Another option might be cvswrite.
For more information see this document.
Try the following:
%# a random matrix of type uint8
x = randi(255, [100,3], 'uint8');
%# build format string
frmt = repmat('%u,',1,size(x,2));
frmt = [frmt(1:end-1) '\n'];
%# write matrix to file in one go
f = fopen('out.txt','wt');
fprintf(f, frmt, x');
fclose(f);
The resulting file will be something like:
16,108,149
174,25,138
11,153,222
19,121,68
...
where each line corresponds to a matrix row.
Note that this is much faster than using dlmwrite which writes one row at a time

reading unformatted fortran file in matlab - which precision?

I have just written out a file:
real*8 :: vol_cel
real*8, dimension(256,256,256) :: dense
[... some operations]
open(unit=8,file=fname,form="unformatted")
write(8)dense(:,:,:)/vol_cell
close(8)
My code to read this in in Matlab:
fid = fopen(fname,'r');
mesh_raw = fread(fid,256*256*256,'double');
fclose(fid);
The min and max values clearly show that it is not reading it in correctly (Min is 0 and max is a largish positive real*8).
min =
3.3622e+38
max =
-3.3661e+38
What precision do I need to set in Matlab to make it read in the unformatted Fortran file?
A somewhat related question: This Matlab code I am using reads binary files OK but not unformatted files. Though I am generating this new data on my Mac OSX using gfortran. It doesn't recognize form="binary" so I can't do it that way. Do I need to add some library or this an endian problem?
===== Progress =====
OK progress. Instead of my ndim*ndim*ndim matrix I just wrote out the x values (column vector) as such:
open(unit=8,file=fnamex,form="unformatted")
write(8)x0
close(8)
Then Matlab reads:
fid = fopen(nfilename,'r');
hr3=fread(fid, 1, 'int32');
x0 = fread(fid,Ntot,'float32');
hr4=fread(fid, 1, 'int32');
fclose(fid);
THAT worked. Then I tried the original ndim**3 matrix, I tried to read with:
fid = fopen(nfilename,'r');
hr3=fread(fid, 1, 'int32');
mesh_raw = fread(fid,ndim*ndim*ndim,'float32');
hr4=fread(fid, 1, 'int32');
fclose(fid);
But that gives me garbage. Perhaps here:
real*4, dimension(:), allocatable :: x0
real*8, dimension(256,256,256) :: dense
Do I need to change: mesh_raw = fread(fid,ndim*ndim*ndim,'float32'); to make sure it is reading a real*8? What would that be? Surely just using 'real*8' verbatim should work? I mean 'real*4' for the x vector works. I mean it reads "dense" but the min/max/average values are wrong.
Your Fortran code shows you writing what is known as an unformatted sequential file. This is a record based file format. Typical implementation (Fortran compiler/platform specific) is for the compiler to write addition record structure information to the file - often (gfortran included) the record length is written at the start and end of each record. Your original Matlab code does not appear to take that into account.
Fortran 2003 introduced stream access (add the ACCESS='STREAM' specifier to the OPEN statement). gfortran supports this feature, FORM='BINARY' is a non-standard synonym on some compilers. A unformatted file created with stream access has no record structure - it is a stream of bytes akin to a C stream. This may be more appropriate for you.
This is most likely an endian problem, as a rough ordered guess put a much more reasonable number on my part. I'm not sure what the solution is exaclty, so I'm going to give you 3 possible solutions, one of which should fix the problem. It depends on your source file.
The trick is simply to change the fopen statement to one of the following:
fid = fopen(fname,'rn'); %Native format (Default usually)
fid = fopen(fname,'rl'); %Little Endian
fid = fopen(fname,'rb'); %Big Endian
fid = fopen(nfilename,'r');
hr3=fread(fid, 1, 'int32');
mesh_raw = fread(fid,ndim*ndim*ndim,'float32');
hr4=fread(fid, 1, 'int32');
fclose(fid);
this is correct, except since you are writing real*8 in fortran, you need to have
mesh_raw = fread(fid,ndim*ndim*ndim,'double');