Trying to load text file into Octave GUI into the correct format - import

I have a text file that contains 10 columns separated by comas and X number of rows denoted by return. The initial header line of the data is a string. The first two columns are character strings while the last 8 columns are integers. So far I have tried fscanf:
t1 = fscanf( test, '%c', inf );
which will import the data as large 1 by XXXXX character matrix and textread which imports it but does not format it correctly:
[a,b,c,d,e,f,g,h,i,j] = textread("test.txt", "%s %s %s %s %s %s %s %s %s %s",\
'headerlines', 1);
I suspect its a simple issue of formatting the notation on textread correctly to get my desired output. Any help is greatly appreciated.

Related

How to skip first few rows when reading from a file in matlab [duplicate]

When I try to use headerlines with textscan to skip the first line of the text file, all of my data cells are stored as empty.
fid = fopen('RYGB.txt');
A = textscan(fid, '%s %s %s %f', 'HeaderLines', '1');
fclose(fid);
This code gives
1x4 Cell
[] [] [] []
Without the headerlines part and without a first line that needs to be skipped in the text file, the data is read in with no problem. It creates a 1x4 cell with data cells containing all of the information from the text file in columns.
What can I do to to skip the first line of the text file and read my data in normally?
Thanks
I think your problem is that you have specified a string instead of an integer value for HeaderLines. The character '1' is interpreted as its ASCII value, 0x31 (49 decimal), so the first 49 lines are skipped. Your file probably contains 49 lines or less, so everything ends up being discarded. This is why you're getting empty cells.
The solution is to replace '1' with 1 (i.e. remove the quotes), like so:
A = textscan(fid, '%s %s %s %f', 'HeaderLines', 1);
and this should do the trick.

read complicated format .txt file into Matlab

I have a txt file that I want to read into Matlab. Data format is like below:
term2 2015-07-31-15_58_25_612 [0.9934343, 0.3423043, 0.2343433, 0.2342323]
term0 2015-07-31-15_58_25_620 [12]
term3 2015-07-31-15_58_25_625 [2.3333, 3.4444, 4.5555]
...
How can I read these data in the following way?
name = [term2 term0 term3] or namenum = [2 0 3]
time = [2015-07-31-15_58_25_612 2015-07-31-15_58_25_620 2015-07-31-15_58_25_625]
data = {[0.9934343, 0.3423043, 0.2343433, 0.2342323], [12], [2.3333, 3.4444, 4.5555]}
I tried to use textscan in this way 'term%d %s [%f, %f...]', but for the last data part I cannot specify the length because they are different. Then how can I read it? My Matlab version is R2012b.
Thanks a lot in advance if anyone could help!
There may be a way to do that in one single pass, but for me these kind of problems are easier to sort with a 2 pass approach.
Pass 1: Read all the columns with a constant format according to their type (string, integer, etc ...) and read the non constant part in a separate column which will be processed in second pass.
Pass 2: Process your irregular column according to its specificities.
In a case with your sample data, it looks like this:
%% // read file
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %s %*c %[^]] %*[^\n]' ) ;
fclose(fid) ;
%% // dispatch data into variables
name = M{1,1} ;
time = M{1,2} ;
data = cellfun( #(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,3} ) ;
What happened:
The first textscan instruction reads the full file. In the format specifier:
term%d read the integer after the literal expression 'term'.
%s read a string representing the date.
%*c ignore one character (to ignore the character '[').
%[^]] read everything (as a string) until it finds the character ']'.
%*[^\n] ignore everything until the next newline ('\n') character. (to not capture the last ']'.
After that, the first 2 columns are easily dispatched into their own variable. The 3rd column of the result cell array M contains strings of different lengths containing different number of floating point number. We use cellfun in combination with another textscan to read the numbers in each cell and return a cell array containing double:
Bonus:
If you want your time to be a numeric value as well (instead of a string), use the following extension of the code:
%% // read file
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %f-%f-%f-%f_%f_%f_%f %*c %[^]] %*[^\n]' ) ;
fclose(fid) ;
%% // dispatch data
name = M{1,1} ;
time_vec = cell2mat( M(1,2:7) ) ;
time_ms = M{1,8} ./ (24*3600*1000) ; %// take care of the millisecond separatly as they are not handled by "datenum"
time = datenum( time_vec ) + time_ms ;
data = cellfun( #(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,end} ) ;
This will give you an array time with a Matlab time serial number (often easier to use than strings). To show you the serial number still represent the right time:
>> datestr(time,'yyyy-mm-dd HH:MM:SS.FFF')
ans =
2015-07-31 15:58:25.612
2015-07-31 15:58:25.620
2015-07-31 15:58:25.625
For comlicated string parsing situations like such it is best to use regexp. In this case assuming you have the data in file data.txt the following code should do what you are looking for:
txt = fileread('data.txt')
tokens = regexp(txt,'term(\d+)\s(\S*)\s\[(.*)\]','tokens','dotexceptnewline')
% Convert namenum to numeric type
namenum = cellfun(#(x)str2double(x{1}),tokens)
% Get time stamps from the second row of all the tokens
time = cellfun(#(x)x{2},tokens,'UniformOutput',false);
% Split the numbers in the third column
data = cellfun(#(x)str2double(strsplit(x{3},',')),tokens,'UniformOutput',false)

trying to use "," delimiter in octave

I am trying to use the textscan function. Here is the data I am trying to read:
"0", "6/23/2015 12:21:59 PM", "93.161", "95.911","94.515","95.917", "-5511.105","94.324","-1415.849","2.376","2.479"
"1", "6/23/2015 12:22:02 PM", "97.514", "96.068","94.727","96.138","-12500.000","94.540","-8094.912","2.386","2.479"
The data logger I am using puts quotes around all values even though they are numbers. If they were separated by commas I could just use csvread. You can see some of my commented out failed attempts. Here is the code I have been trying:
fileID = fopen('test3.txt');
%C = textscan(fileID,'"%f%s%f%f%f%f%f%f%f%f%f"', 'delimiter', '","');
C = textscan(fileID,'"%f","%s","%f","%f","%f","%f","%f","%f","%f","%f","%f"');
%C = textscan(fileID,'%s', 'delimiter', '"');
%C = strread(fileID, "%s %s %f %f %f %f %f %f %f %f %f", ",");
fclose(fileID);
celldisp(C)
If i run line 3 I get:
C{1} =
NaN
NaN
94.324
NaN
... omitted lines here ...
NaN
99.546
NaN
If I run lines 4, 5, or 6, I get:
warning: strread: unable to parse text or file with given format string
warning: called from
strread at line 688 column 7
textscan at line 318 column 8
test2 at line 4 column 3
error: some elements undefined in return list
error: called from
textscan at line 318 column 8
test2 at line 4 column 3
You want the magic word. The magic word is not please here, it's multipledelimsasone.
Basically, you want both " and , to be treated as delimiter characters. textscan looks for any of the delimiter characters, not a given order, which is why '","' didn't do what you expected. Turning multipledelimsasone on makes textscan treat any combination of " and , as a single delimiter.
C = textscan(fileID,'%f%s%f%f%f%f%f%f%f%f%f', 'delimiter', '," ','multipledelimsasone',1);
Without this option on, what textscan thinks is happening is lots of empty values; the delimiter list isn't taken as any sort of order, just a list of possible separators. So if it sees ",", it thinks you have three delimiters with nothing inbetween → two empty values → NaN.

Textscan Matlab ; Doesn't read the format

I have a file in the following format:
**400**,**100**::400,descendsFrom,**76**::0
**400**,**119**::400,descendsFrom,**35**::0
**400**,**4**::400,descendsFrom,**45**::0
...
...
Now I need to read, the part only in the bold. I've written the following formatspec:
formatspec = '%d,%d::%*d,%*s,%d::%*d\n';
data = textscan(fileID, formatspec);
It doesn't seem to work. Can someone tell me what's wrong?
I also need to know how to 'not use' delimiter, and how to proceed if I want to express the exact way my file is written in, for example in the case above.
EDITED
A possible problem is with the %s part of the formatspec variable. Because %s is an arbitrary string therefore the descendsFrom,76::0 part of the line is ordered to this string. So with the formatspec '%d,%d::%d,%s,%d::%d\n' you will get the following cells form the first line:
400 100 400 'descendsFrom,76::0'
To solve this problem you have two possibilities:
formatspec = %d,%d::%d,descendsFrom,%d::%d\n
OR
formatspec = %d,%d::%d,%12s,%d::%d\n
In the first case the 'descendForm' string has to be contained by each row (as in your example). In the second case the string can be changed but its length must be 12.
Your Delimiter is "," you should first delimit it then maybe run a regex. Here is how I would go about it:
fileID = fopen('file.csv');
D = textscan(fileID,'%s %s %s %s ','Delimiter',','); %read everything as strings
column1 = regexprep(D{1},'*','')
column2 = regexprep(D{2},{'*',':'},{'',''})
column3 = D{3}
column4 = regexprep(D{4},{'*',':'},{'',''})
This should generate your 4 columns which you can then combine
I believe the Delimiter can only be one symbol. The more efficient way is to directly do regexprep on your entire line, which would generate:
test = '**400**,**4**::400,descendsFrom,**45**::0'
test = regexprep(test,{'*',':'},{'',''})
>> test = 400,4400,descendsFrom,450
You can do multiple delimiters in textscan, they need to be supplied as a cell array of strings. You don't need the end of line character in the format, and you need to set 'MultipleDelimsAsOne'. Don't have MATLAB to hand but something along these lines should work:
formatspec = '%d %d %*d %*s %d %*d';
data = textscan(fileID, formatspec,'Delimiter',{',',':'},'MultipleDelimsAsOne',1);
If you want to return it as a matrix of numbers not a cell array, try adding also the option 'CollectOutput',1

Load CSV with text qualifier into MATLAB/Octave

Data
Assume the following data format (with a header line in the first row, 500+ rows):
1, "<LastName> ,<Title>. <FirstName>", <Gender>, 99.9
My Code
I've tried this (IGNORE: see edit below):
[flag, name, gender, age] = textread('file.csv', '%d %q %s %f', 'headerlines', 1);
The Error
...and get the following error message
error: textread: A(I): index out of bounds; value 1 out of bound 0
error: called from:
error: C:\Program Files\Octave\Octave3.6.2_gcc4.6.2\share\octave\3.6.2\m\io\textread.m at line 75, column 3
Questons:
Is my format string incorrect given the text qualifier (and the comma embedded in the "name" string)?
Am I even using the correct method of loading a CSV into MATLAB\Octave?
EDIT
I forgot the delimiter (error message returns failure on different line in strread.m):
[flag, name, gender, age] = textread('file.csv', '%d %q %s %f', 'headerlines', 1, 'delimiter', ',');
I went with this, it however splits the text qualified string for the name field into two separate fields, so any text qualified fields that contain the field delimiter in the string will create an extra output column (I'm still interested to know why the %q format didn't work for this field -> whitespace perhaps?):
% Begin CSV Import ============================================================================
% strrep is used to strip the text qualifier out of each row. This is wrapped around the
% call to textread, which brings the comma delimited data in row-by-row, and skips the 1st row,
% which holds column field names.
tic;
data = strrep(
textread(
'file.csv' % File name within current working directory
,'%s' % Each row is a single string
,'delimiter', '\n' % Each new row is delimited by the newline character
,'headerlines', 1 % Skip importing the first n rows
)
,'"'
,''
);
for i = 1:length(data)
delimpos = findstr(data{i}, ",");
start = 1;
for j = 1:length(delimpos) + 1,
if j < length(delimpos) + 1,
csvfile{i,j} = data{i}(start:delimpos(j) - 1);
start = delimpos(j) + 1;
else
csvfile{i,j} = data{i}(start:end);
end
end
end
% Return summary information to user
printf('\nCSV load completed in -> %f seconds\nm rows returned = %d\nn columns = %d\n', toc, size(csvfile)(1), size(csvfile)(2));