MATLAB: How to import multiple CSV files with mixed data types - matlab

I have just started learning MATLAB and have difficulties to import csv files to a 2-D array..
Here is a sample csv for my needs:(all the csv files are in the same format with fixed columns)
Date, Code, Number....
2012/1/1, 00020.x1, 10
2012/1/2, 00203.x1, 0300
...
As csvread() only works with integer numbers, should I import numeric data and text data separately or is there any quick way to import multiple csv files with mixed data types?
Thanks a lot!!

What you're looking for is maybe the function xlsread.
It opens any file recognized by Excel, and automatically separates text data from numerical data.
The problem is that the default delimiter for at least on my computer is ;, and not , (at least for my locale here in Brazil). So xlsread will try to separate the fields on the file with a ;, and not a comma as you'd like.
To change that you have to change your system locales to add the comma as the list separator. So if you feel like it, to do it in windows vista, click Start, Control Panel, Regional and Language Options, Customize this format, and change the List Separator from ';' to ','. On other windows the process should be almost the same.
After doing that, typing:
[num, txt, all] = xlsread('your_file.csv');
will return something like:
num =
10
300
txt =
'01/01/2012' ' 00020.x1'
'02/01/2012' ' 00203.x1'
all =
'01/01/2012' ' 00020.x1' [ 10]
'02/01/2012' ' 00203.x1' [300]
Notice that if your locale has already the list separator set to ',', you won't have to change anything on your system to make that work.
If you don't want to change your system just to use the xlsread function, then you could use the textscan function described here: http://www.mathworks.com/help/techdoc/ref/textscan.html
The problem is that it is not as simple as calling it, as you will have to open the file, iterate on the lines, and tell matlab explicitly the format of your file.
Best regards

I recently wrote a function that solves exactly this problem. See delimread.
It's worth noting that xlsread on csv files only works in windows. On Linux or Mac, xlsread works in 'basic' mode which cannot read csv files. It might not be a great idea in the longrun to use xlsread in case you need to migrate across platforms or automate code runs on Linux servers.
xlsread is also much slower than other text parsing functions since it opens an Excel session to read the file.

Related

matlab lose response when use xlsread reading a large spreadsheet

I am trying to use xlsread functioin to read spreadsheets of 6000x2700 (xlsx file).
I have two questions:
First, when I use something like
[num,txt,~]=xlsread(input_file,input_sheet,'A1:CYY6596')
Matlab keeps showing 'busy' and lose response (while I can open it in excel within 30 seconds).
Is there any solution If I don't want to loop through ranges of the xlsx file? In other word, can I just dump spreadsheet of this size into matlab using xlsread?
Alternatively, Maybe I can use loops to read these files range by range, but I cannot identify the last column of each of the spreadsheets unless I read the whole file first. Therefore, If I cannot identify the last column, it is hard to make loops and do my interpretation on the file.
So My second questions is: Is there a way to identify the last column of the spreadsheet without reading the whole spreadsheet?
Thanks.
EDIT:However, if I run a similar code which only reads first 400 columns ('A1:RY6596') of the spreadsheet, such problem doesn't happen.
which version of matlab you are using?
matlab has a problem to load bix excell file.
convert the excell in csv and use M = csvread(filename).
You can try to convert .xlsx into .xls also.
You can Try the tool in
File Exchange

How can I get around MATLAB's specifications of csvread?

I am trying to create a program that takes in multiple csv files. However, they include both strings and numbers.
I have csv files that looks something like this:
"Project","Task","Value Type", "Value"
"105", "06.05.02", "cost", "3434"
"105", "06.05.02", "obligation", "3434"
"106", "06.05.02", "cost", "500"
"106", "06.05.02", "obligation", "500"
The number of columns is fixed (there are actually 23, I only listed four for readability), but each csv has a different number of rows. If I save it as an xls file, it works perfectly. However, this takes too long if there is a lot of files and the end user doesn't want to deal with that.
Similar questions suggested textread, but the first row would be
textread('filename.csv', '%s%s%s%s', 'delimiter', ',');
while the rest of the file is
textread('filename.csv', '%f%s%s%f', 'delimiter', ',');
In comparison to the simplicity of having the numbers, strings, and raw data in corresponding arrays using xlsread, having 23 different arrays seems a bit complicated.
What would be the best solution here?
The files are large, but not large enough that I am worried about efficiency.
Is there a way to change the extension of the files from .csv to .xls from within my program? (I looked this up as well, but couldn't find anything that worked) I would really like to use xlsread, but if this isn't possible, is there a way to have textread save the first row of a csv with certain variable types(%s%s%s%s) and then save the rest of the rows with a different variable type (%f%s%s%f)?
xlsread does read csv files, so there should be no need to convert. Just read directly (tested on 2013b with a small file of mixed numeric and string data):
[num, text, alldata] = xlsread('test.csv')
Note: this apparently only works on Windows machines. If just changing the extension makes xlsread work for you, you can rename with movefile:
oldfile = somefile.csv;
ext = 'xls';
[~, name, ~ ] = fileparts(oldfile);
newfile = [name,ext];
movefile(oldfile,newfile);
If you have many files, this would go in a loop and oldfile would be taken from the output of something like a dir or ls command giving you all the .csv files.
Incidentally, while you might see it mentioned in older questions, textread is now not recommended, use textscan instead for cases where you need more complexity/control over the input. It can be very powerful but for this case is probably like cracking a nut with a sledgehammer.
If you don't need the headers, for example, you can take the whole file in one line with:
C = textread('filename.csv', '%f%s%s%f', 'Delimiter', ',','HeaderLines',1);

Matlab's Import Tool recognizes a column as numbers but generate %s in formatSpec

I use Matlab's Import Tool to generate a script that will take care of importing several CSV files with the same columns.
The Import Tool successfully manages to recognize the type of each column of my CSV file:
However, in the generated script, the same column are cast as strings (%s = string):
Any idea why?
Surprisingly it works fine with CSV files with fewer columns (it works with 70-column CSV files, but the issue arises with with 120-column CSV files). Here is one example of a CSV file that triggers the issue.
I use R2014b x64 with Windows 7 SP1 x64 Ultimate.
This is happening because one of the columns in your file contains data which contains numbers and text. The Import Tool is predicting that you're going to want to extract the numbers from this field, so it labels the column as 'NUMBER'. However, standard textscan doesn't allow for this, so the Import Tool must generate code to read in all of the data as text, and does the numeric conversion afterwards. The Import Tool is trying to help avoid errors using textscan.
The result of running the generated code is still numeric, as is shown in the Import Tool.
The specific column is labeled SEGMENT_ID in your example file. It contains data like:
l8K3ziItQ2pRFQ8
L79Y0zA8xF7fVTA
JYYqCCwRrigaUmD

Convert dataset of .mat format to .csv octave/matlab

there are datasets in .mat format in the this site: http://www.cs.nyu.edu/~roweis/data.html
I want to change the format to .csv.
Can someone tell me how to change the format to create the .csv file.
Thanks!
Suppose that the .mat files from the site are available already. In the command window in Matlab, you may write, for example:
load('C:\Users\YourUserName\Downloads\mnist_all.mat');
to load the .mat file; the result should be a set of matrices test0, test1, ..., train0, train1 ... created in your workspace, which you want saved as CSV files. Because they're different size, you need to save one CSV per variable, e.g. (also in the command window):
csvwrite('C:\Users\YourUserName\Downloads\mnist_test0.csv', test0);
Repeat the command for each variable, and do not forget to change also the name of the output file to avoid overwriting.
Did you tried the csvwrite function in Matlab?
Just load your .mat files with the load function and then write them with csvwrite!
I do not have a Matlab license so I installed GNU Octave 4.2.1 (2017) on Windows 10 (thank you to John W. Eaton and others). I was not fully successful using the csvwrite so I used the following workaround. (BTW, I am totally incompetent in the Octave world. csvwrite worked for simple data structures).
In the Command Window I used the following two commands
load myfile.mat
save("-text","myfile.txt","variablename")
When the "myfile.mat" is loaded, the variable names for the data vectors loaded are displayed in the workspace window. This is the name(s) to use in the save command. Some .mat files will load several data structures.
The "-text" option is the default, so you may not need to include this option in the command.
The output file lists the .mat file contents in text format as single column (of potentially sequential variables). It should be easy to use you text editor to massage this data into the original matrix structure for use in whatever app you are comfortable with.
Had a similar issue. Needed to convert a series of .mat files that had two columns of numerical data into standard data files (ascii text). Note that I don't really ever use csv, but everything here could be adapted by using csvwrite instead of the standard save.
Using Octave 4.2.1 ....
load myfile.mat
LI = [L, I] ## L and I are column vectors representing my data
save myfile.txt LI
Note that L and I appear to be default variable names chosen by Octave for the two columns vectors in my original data file. Ideally a script that iterated over all files with the .mat extension in my directory would be ideal, but this got the job done. It saves the data as two space separated columns of data.
*** Update
The following script works on Octave 4.2.1 for a series of data files with the .mat extension that are in the same directory. It will iterate over them and write the data out to text files with the same name but with the extension .dat . Note that this is not efficient, so if you have a lot of files or if they are large it can take a while to run. I would suggest that you run it from the command line using octave mat2dat.m so you can actually watch it go.
I make no guarantees that this will work for you, but it did for me. I also am NOT proficient in Octave or Matlab, so I'm sure a better solution exists.
# mat2dat.m
dirlist = glob("*.mat")
for i=1:length(dirlist)
filename = dirlist{i,1}
load(filename, "L", "I")
LI = [L,I]
tmpname = filename(1:length(filename)-3)
txtname = strcat(tmpname, 'dat')
save(txtname, "LI")
end

Reading large csv files with strings containing commas as one field

I have a large .csv file (~26000 rows). I want to be able to read it into matlab. Another problem is that it contains a collection of strings delimited by commas in one of the fields.
I'm having trouble reading it. I tried stuff like tdfread, which won't work here. Any tricks with textscan i should be aware about?
Is there any other way?
I'm not sure what is generating your CSV file but that is your problem.
The point of a CSV file, is that the file itself designates separation of fields. If the text of the CSV contains commas, then nothing you can do will help you. How would ANY program know when the text in a single field contains commas, or when that comma is a field delimiter?
Proper CSV would have a text qualifier. Some generators/readers gives you the option to use one. The standard text qualifier is a " (quote). Its changeable, though, because your text may contain those, too.
Again, its all about generating proper CSV content.
There's a chance that xlsread won't give you the answer you expect -- do the strings always appear in the same columns, for example? I think (as everyone else seems to :-) that it would be more robust to just use
fid = fopen('yourfile.csv');
and then either textscan
t = textscan(fid, '%s', delimiter', sprintf('\n'));
t = t{1};
or just fgetl (the example in the help is perfect).
After that you can do some line-by-line processing -- using textscan again on the text content of each line, for example, is a nice, quick way to get a cell-array that will allow fast analysis of each line.
You have a problem because you're reading it in as a .csv, and you have commas within your data. You can get it in Excel and manipulate the date, possibly extract the unwanted commas with Excel formulas. I work with .csv files for DB imports quite a bit. I imagine matLab has similar rules, which is - no commas in your data.
Can you tell us more about your data? Are there commas throughout, our just one column? Maybe you can read it in as tab delimited?
Are you using a Unix system? The reason I am asking is that you could use a command-line function such as sed and regular expressions to clean those data files before you pass them into Matlab. Here is a link that explains how to do exactly what you are looking for.
Since, as others have observed, your file is CSV with commas inside what you think of as a single field, it's going to be hard to persuade Matlab that that really is only one field. I think your best strategy is going to be to read one line at a time, into a string acting as a buffer, and to translate it, field-by-field, into the variables or other data structures that you want. Since Matlab has in-built regular expression capabilities this shouldn't be too hard.
And, as others have already suggested, posting a sample of your data would help us to help you.
One easy solution is:
path='C:\folder1\folder2\';
data = 'data.csv';
data = dataset('xlsfile',sprintf('%s\%s', path,data));
Of course you could also do the following:
[data,path] = uigetfile('C:\folder1\folder2\*.csv');
data = dataset('xlsfile',sprintf('%s\%s', path,data));
now you will have loaded the data as dataset. An easy way to get a column 1 for example is
double(data(1))