Reading portion of CSV with multiple data types - matlab

I have a csv file with both numbers and letters that I want to read. The file also has headers(first row) but I can read them separately so that's not a concern.
What I can't solve is the fact that the file has multiple data types and that I only want to read a portion(since the file is very large), say the first 5000 rows.
I've tried xlsread with three outputs but I get the following error : "??? Error: Object returned error code: 0x800A03EC". I've also tried textscan but if I understood correctly you've to type the variable types as an input and that's not very practical for me since I have a large amount of columns. I hope this is not a duplicate but I've read other solutions and I could not apply them to my problem.
Is there a way to do this?
Thank you in advance

To test the problem i created a small test.csv file.
It contains the following lines:
header1;header2;header3
a;1;xx
b;2;yy
c;3;zz
d;4;xxx
e;5;yyy
I use the following code to read the data:
range = 'A2:C3'
[num, text, both] = xlsread('test.csv', 1, range)
Output of the both variable, that contains the text and numbers, is as expected:
both =
'a' [1] 'xx'
'b' [2] 'yy'

Related

MATLAB fwrite\fread issue: two variables are being concatenated

I am reading in a binary EDF file and I have to split it into multiple smaller EDF files at specific points and then adjust some of the values inside. Overall it works quite well but when I read in the file it combines 2 character arrays with each other. Obviously everything afterwords gets corrupted as well. I am at a dead end and have no idea what I'm doing wrong.
The part of the code (writing) that has to contain the problem:
byt=fread(fid,8,'*char');
fwrite(tfid,byt,'*char');
fwrite(tfid,fread(fid,44));
%new number of records
s = records;
fwrite(tfid,s,'*char');
fseek(fid,8,0);
%test
fwrite(tfid,fread(fid,8,'*char'),'*char');
When I use the reader it combines the records (fwrite(tfid,s,'*char'))
with the value of the next variable. All variables before this are displayed correctly. The relevant code of the reader:
hdr.bytes = str2double(fread(fid,8,'*char')');
reserved = fread(fid,44);%#ok
hdr.records = str2double(fread(fid,8,'*char')');
if hdr.records == -1
beep
disp('There appears to be a problem with this file; it returns an out-of-spec value of -1 for ''numberOfRecords.''')
disp('Attempting to read the file with ''edfReadUntilDone'' instead....');
[hdr, record] = edfreadUntilDone(fname, varargin);
return
end
hdr.duration = str2double(fread(fid,8,'*char')');
The likely problem is that your character array s does not have 8 characters in it, but you expect there to be 8 when you read it from the file. Whatever the number of characters in the array is, that's how many values fwrite will write out to the file. Anything less than 8 characters and you'll end up reading part of the next piece of data when you read from the file.
One fix would be to pad s with blanks before writing it:
s = [blanks(8-numel(records)) records];
In addition, the syntax '*char' is only valid when using fread: the * indicates that the output class should be 'char' as well. It's unnecessary when using fwrite.

Index out of bounds after reading a text file

I have the following simple code, and I tried to use one of the indices from the .txt file. The index that I want is at (4,1) while the size of my matrix in the .txt file is (8,4). When I run the code, MATLAB give me the following error;
Attempted to access q(4,1); index out of
bounds because size(q)=[1,601]
Can someone help me understand why I receive the error and how to fix it?
Here is the code:
q = fileread('sv11edit.txt');
toe = q(4,1)
The answer will depend on the format of the file sv11edit.txt. However, fileread returns a string of characters. In this case, it gives you a string that is 601 characters long. You receive an error because you assume that q is 8 by 4, but this is not the case.
Check what is being stored in q before you try anything like the second line of your code. The function load may be a better alternative to fileread.

How to import large dataset and put it in a single column

I want to import the large data set (multiple column) by using the following code. I want to get all in a single column instead only one row (multi column). So i did transpose operation but it still doesn't work appropriately.
clc
clear all
close all
dataX_Real = fopen('dataX_Real_in.txt');dataX_Real=dataX_Real';
I will really appreciate your support and suggestions. Thank You
The sample files can be found using the following link.
When using fopen, all you are doing is opening up the file. You aren't reading in the data. What is returned from fopen is actually a file pointer that gives you access to the contents of the file. It doesn't actually read in the contents itself. You would need to use things like fread or fscanf to read in the content from the text data.
However, I would recommend you use dlmread instead, as this doesn't require a fopen call to open your file. This will open up the file, read the contents and store it into a variable in one function call:
dataX_Real = dlmread('dataX_Real_in.txt');
By doing the above and using your text file, I get 44825 elements. Here are the first 10 entries of your data:
>> format long;
>> dataX_Real(1:10)
ans =
Columns 1 through 4
-0.307224970000000 0.135961950000000 -1.072544100000000 0.114566020000000
Columns 5 through 8
0.499754310000000 -0.340369000000000 0.470609910000000 1.107567700000000
Columns 9 through 10
-0.295783020000000 -0.089266816000000
Seems to match up with what we see in your text file! However, you said you wanted it as a single column. This by default reads the values in on a row basis, so here you can certainly transpose:
dataX_Real = dataX_Real.';
Displaying the first 10 elements, we get:
>> dataX_Real = dataX_Real.';
>> dataX_Real(1:10)
ans =
-0.307224970000000
0.135961950000000
-1.072544100000000
0.114566020000000
0.499754310000000
-0.340369000000000
0.470609910000000
1.107567700000000
-0.295783020000000
-0.089266816000000

How to import column of numbers from mixed string numerical text file

Variations of this question have already been asked several times, for example here. However, I can't seem to get this to work for my data.
I have a text file with 3 columns. First and third columns are floating point numbers. Middle column is strings. I'm only interested in getting the first column really.
Here's what I tried:
filename=fopen('heartbeatn1nn.txt');
A = textscan(filename,'%f','HeaderLines',0);
fclose(filename);
When I do this A comes out as just a single number--the first element in the column. How do I get the whole column? I've also tried this with the '.tsv' file extension, same result.
Also tried:
filename=fopen('heartbeatn1nn.txt');
formatSpec='%f';sizeA=[1 Inf];
A = fscanf(filename,formatSpec,sizeA);
fclose(filename);
with same result.
Could the file size be a problem? Not sure how many rows but probably quite a few since file size is 1.7M.
Assuming the columns in your text file are separated by single whitespace characters your format specification should look like this:
A = textscan(filename,'%f %s %f');
A now contains the complete file content. To obtain the first column:
first_col = A{:,1};
Alternatively, you can tell textscan to skip the unneeded fields with the * option:
first_col = cell2mat( textscan(filename, '%f %*s %*f') );
This returns only the first column.

Storing comma separated .csv data from a web source into a matrix in matlab

I'm trying to download this comma separated info and save it so that it can be stored as a matrix which can then be accessed. So far I have code which I think should store the info in a file called test.csv but im not sure:
>> urlwrite('http://xweb.geos.ed.ac.uk/~weather/jcmb_ws/JCMB_2013_Mar.csv','test.csv');
d = csvread('test.csv');
??? Error using ==> dlmread at 145
Mismatch between file and format string.
Trouble reading number from file (row 1, field 1) ==> date-
Error in ==> csvread at 50
m=dlmread(filename, ',', r, c);
I am getting the above error. It reads the data fine using urlread. Does anybody know what the correct syntax should be and how I can get it stored as a matrix? Thanks in advance.
You can get the data directly from web without saving to file with URLREAD:
webdata = urlread('http://xweb.geos.ed.ac.uk/~weather/jcmb_ws/JCMB_2013_Mar.csv');
This will give you the whole file as one string with lines delimited by '\n'. You can process it in multiple ways. For example:
tmp = textscan(webdata,['%s',repmat('%f',1,8)],'delimiter',',','headerlines',1);
date = tmp{1};
data = horzcat(tmp{2:end});
To get column headers you can do, for example:
colheader = textscan(webdata,'%s',1,'delimiter','\n');
colheader = regexp(colheader{:},',','split');
colheader = colheader{:};
You can also convert the data to a structure:
Data = cell2struct(tmp, genvarname(colheader),2);
Try using readtext.m. That is a program which can read almost any text file. The problem in your data could be: they don't have uniform delimiters i.e. somewhere two columns are separated by tab, somewhere by comma.
The operation can be performed like this:
urlwrite('http://xweb.geos.ed.ac.uk/~weather/jcmb_ws/JCMB_2013_Mar.csv','test.csv');
data= readtext('test.csv');
It should work.
Your problem lies right here:
Trouble reading number from file (row 1, field 1) ==> date-
Matlab says it encountered "date-" in the first cell. I guess you have a header row or two. You can check in the file and call
d = csvread('test.csv',ROW);
Where ROW is the number of the row where actual data starts (number of header rows + 1).