Reading CSV files with MATLAB? - matlab

I am trying to read in a .csv file with MATLAB. Here is my code:
csvread('out2.csv')
This is what out2.csv looks like:
03/09/2013 23:55:12,129.32,129.33
03/09/2013 23:55:52,129.32,129.33
03/09/2013 23:56:02,129.32,129.33
On windows I am able to read this exact same file with the xlsread function with no problems. I am currently on a linux machine. When I first used xlsread to read the file I was told "File is not in recognized format" so I switched to using csvread. However, using csvread, I get the following error message:
Error using dlmread (line 139)
Mismatch between file and format string.
Trouble reading number from file (row 1u, field 2u) ==> /09/2013
23:55:12,129.32,129.33\n
Error in csvread (line 48)
m=dlmread(filename, ',', r, c)
I think the '/' in the date is causing the issue. On windows, the 1st column is interpreted as a string. On linux it seems to be interpreted as a number, so it tries to read the number and fails at the backslash. This is what I think is going on at least. Any help would be really appreciated.

csvread can only read doubles, so it's choking on the date field. Use textscan.
fid = fopen('out2.csv');
out = textscan(fid,'%s%f%f','delimiter',',');
fclose(fid);
date = datevec(out{1});
col1 = out{2};
col2 = out{3};
Update (8/31/2017)
Since this was written back in 2013, MATLAB's textscan function has been updated to directly read dates and times. Now the code would look like this:
fid = fopen('out2.csv');
out = textscan(fid, '%{MM/dd/uu HH:mm:ss}D%f%f', 'delimiter', ',');
fclose(fid)
[date, col1, col2] = deal(out{:});
An alternative as mentioned by #Victor Hugo below (and currently my personal go-to for this type of situation) would be to use readtable which will accept the same formatting string as textscan but assemble the results directly into a table object:
dataTable = readtable('out2.csv', 'Format', '%{MM/dd/uu HH:mm:ss}D%f%f')
dataTable.Properties.VariableNames = {'date', 'col1', 'col2'};
dataTable =
3×3 table
date col1 col2
___________________ ______ ______
03/09/2013 23:55:12 129.32 129.33
03/09/2013 23:55:52 129.32 129.33
03/09/2013 23:56:02 129.32 129.33

Unfortunately, the documentation for csvread clearly states:
M = csvread(filename) reads a comma-separated value formatted file, filename. The file can only contain numeric values.
Since / is neither a comma, nor a numeric value, it produces an error.

You can use readtable, as it will accept any input.
https://www.mathworks.com/help/matlab/ref/readtable.html

Yeah xlsread requires Microsoft Excel to be installed, unless it is run in 'basic' mode and 'basic' mode only reads .xls .xlsx and .xlsm files.
Another alternative are a number of user-written functions posted on MATLAB's file exchange that will work on linux and are more flexible at reading non formatted content.
One example:
https://www.mathworks.com/matlabcentral/fileexchange/75219-csv2cellfast-import-csv-files-on-machines-without-excel

Related

Table imported into Matlab from CSV, variable name prefixed by "x___"

I have a bunch of automatically-generated CSV files with headers, which I'd like to import into Matlab 2016a as a table. I used code such as
T = readtable('d:\test.csv', 'readvariablenames', true);
However, even though the name of the CSV's first column is "runNr", the first column in the Matlab table gets named "x___runNr"
This clearly has something to do with the CSV files being in a slightly format different from that expected by Matlab. For instance, it could be that my CSVs have a Byte Order Mark in the beginning. Still, I am not sure what to do to fix this, since I cannot change the format of the CSVs. Readtable, on the other hand, gives me the output format I am most comfortable with.
Upon calling readtable, the following warning is issued:
"Warning: Variable names were modified to make them valid MATLAB identifiers. "
However, some of my CSVs (perhaps produced by a different version of the software that outputs them) are still read OK, and for those CSVs the same warning is displayed, thus the warning alone is not indicative of the problem.
I think I found the source of the problem:
Like you have suspected, the encoding of your CSV file is "UTF-8-BOM" (I saw it using Notepad++).
The UTF-8 representation of the BOM is the (hexadecimal) byte sequence 0xEF,0xBB,0xBF
MATLAB R2019a knows to ignore the first 3 bytes, but R2016a is "confused" by the 3 characters, and add x___ prefix to runNr.
A workaround, is create a temporary file with out the first 3 characters:
f = fopen('test.csv', 'r');
A = fread(f, '*uint8');
fclose(f);
if all(A(1:3) == hex2dec(['EF'; 'BB'; 'BF']))
f = fopen('tmp.csv', 'w');
fwrite(f, A(4:end)); %Skip first 3 characters.
fclose(f);
T = readtable('tmp.csv', 'readvariablenames', true);
else
T = readtable('test.csv', 'readvariablenames', true);
end
There might be more efficient solutions (like simply removing the x___).

Matlab textscan formatspec delimiter error

When reading a large csv file Matlab doesn't recognize ||,|| as a proper delimiter as input argument for textscan. The data is as follows (simplified):
||X||,||Y||,||Z|| (header)
||1||,||2||,||4||
||4||,||4||,||3||
etc.
I use data = textscan(fileID,formatSpec,'Delimiter',','); to read in the data with some format spec '%f %f %f'.
My rubber band solution has been to use 010 editor to replace all '||' with '', making it a proper csv file for matlab, but due to the size of the document (6M lines with approx 35 fields) and the frequency of new documents this is hardly a great solution.
Does anyone know a proper way to import such a file?
You should be able to include it in the format specifier:
data = textscan(fid, '||%f||,||%f||,||%f||', 'headerlines', 1)
and then just leave out the delimiter.
Edit (Following on from comments)
If you are trying to read in strings, the trick is to get it to read in strings without the | character. This is done using %[^|], like this:
data = textscan(fid, '|| %[^|] ||,|| %[^|] ||,|| %[^|] ||', 'headerlines', 1)

How do I export a matrix in MATLAB?

I'm trying to export a matrix f that is double. My data in f are real numbers in three columns. I want a txt file as an output with the columns separated by tabs. However, when I try the dlmwrite function, just the first column appears as output.
for k = 1:10
f = [idx', firsttime', sectime'];
filename = strcat(('/User/Detection_rerun/AF_TIMIT/1_state/mergedlabels_train/'),(files_train{k,1}),'.lab');
dlmwrite(filename,f,'\t') ;
end
When I use dlmwrite(filename,f,'\t','newline','pc') ; I keep getting an error Invalid attribute tag: \t . I even tried 'tab' instead of '\t' but a similar error appears. Please let me know if you have any suggestions. thank you
This is because you are not calling dlmwrite properly. To specify the delimiter, you must use the delimiter flag, followed by the specific delimiter you want. In your case, you use \t. In other words, you need to do this:
for k = 1:10
f = [idx', firsttime', sectime'];
filename = strcat(('/User/Detection_rerun/AF_TIMIT/1_state/mergedlabels_train/'),(files_train{k,1}),'.lab');
dlmwrite(filename,f,'delimiter','\t') ;
end
BTW, you are using the newline flag with pc, meaning that you are specifying carriage returns that are recognized by a PC. I suggest you leave this out and allow MATLAB to automatically infer this. Only force the newline characters if you know what you're doing.
FWIW, the MATLAB documentation is pretty clear about delimiters and other quirks about the function: http://www.mathworks.com/help/matlab/ref/dlmwrite.html

Reading portion of CSV with multiple data types

I have a csv file with both numbers and letters that I want to read. The file also has headers(first row) but I can read them separately so that's not a concern.
What I can't solve is the fact that the file has multiple data types and that I only want to read a portion(since the file is very large), say the first 5000 rows.
I've tried xlsread with three outputs but I get the following error : "??? Error: Object returned error code: 0x800A03EC". I've also tried textscan but if I understood correctly you've to type the variable types as an input and that's not very practical for me since I have a large amount of columns. I hope this is not a duplicate but I've read other solutions and I could not apply them to my problem.
Is there a way to do this?
Thank you in advance
To test the problem i created a small test.csv file.
It contains the following lines:
header1;header2;header3
a;1;xx
b;2;yy
c;3;zz
d;4;xxx
e;5;yyy
I use the following code to read the data:
range = 'A2:C3'
[num, text, both] = xlsread('test.csv', 1, range)
Output of the both variable, that contains the text and numbers, is as expected:
both =
'a' [1] 'xx'
'b' [2] 'yy'

Matlab : Read a file name in string format from a .csv file

I am having a .csv file which contains let's say 50 rows.
At the beginning of each row I have a file name in the following format 001_02_03.bmp followed by values separated by commas. Something like this :
001_02_03.bmp,20,30,45,10,40,20
Can someone tell me how can I read the first column from the data?
I know how to obtain the data from the second column onward. I am using the csvread function like this X = csvread('filename.csv', 0, 1);. If I try to read the first column in the same manner it outputs an error, saying the csvread does not support string format.
Use textscan, ie:
fid1 = fopen(csvFileName);
X = textscan(fid1, '%s%f%f%f%f%f%f', 'Delimiter', ',');
fclose(fid1);
FirstCol = X{1, 1};
A little more detail? csvread only works with purely numeric data, so you can't use it to get in data with a .bmp, or underscores for that matter. Thus we use textscan. The funny looking format string input to textscan is just saying that the columns are, in order, of type string %s, then the next 6 columns are of type double %f%f%f%f%f%f (or you might choose to alter this to reflect an integer datatype - I personally rarely bother unless the quantity of data is huge or floating point precision is a problem).
Note, if you just wanted to get the first column and ignore the rest, you can replace the format string with %s% %*[^\n]. A final point, if your csv file has a header line, you can skip it using the HeaderLines optional input to textscan.