trying to use "," delimiter in octave - matlab

I am trying to use the textscan function. Here is the data I am trying to read:
"0", "6/23/2015 12:21:59 PM", "93.161", "95.911","94.515","95.917", "-5511.105","94.324","-1415.849","2.376","2.479"
"1", "6/23/2015 12:22:02 PM", "97.514", "96.068","94.727","96.138","-12500.000","94.540","-8094.912","2.386","2.479"
The data logger I am using puts quotes around all values even though they are numbers. If they were separated by commas I could just use csvread. You can see some of my commented out failed attempts. Here is the code I have been trying:
fileID = fopen('test3.txt');
%C = textscan(fileID,'"%f%s%f%f%f%f%f%f%f%f%f"', 'delimiter', '","');
C = textscan(fileID,'"%f","%s","%f","%f","%f","%f","%f","%f","%f","%f","%f"');
%C = textscan(fileID,'%s', 'delimiter', '"');
%C = strread(fileID, "%s %s %f %f %f %f %f %f %f %f %f", ",");
fclose(fileID);
celldisp(C)
If i run line 3 I get:
C{1} =
NaN
NaN
94.324
NaN
... omitted lines here ...
NaN
99.546
NaN
If I run lines 4, 5, or 6, I get:
warning: strread: unable to parse text or file with given format string
warning: called from
strread at line 688 column 7
textscan at line 318 column 8
test2 at line 4 column 3
error: some elements undefined in return list
error: called from
textscan at line 318 column 8
test2 at line 4 column 3

You want the magic word. The magic word is not please here, it's multipledelimsasone.
Basically, you want both " and , to be treated as delimiter characters. textscan looks for any of the delimiter characters, not a given order, which is why '","' didn't do what you expected. Turning multipledelimsasone on makes textscan treat any combination of " and , as a single delimiter.
C = textscan(fileID,'%f%s%f%f%f%f%f%f%f%f%f', 'delimiter', '," ','multipledelimsasone',1);
Without this option on, what textscan thinks is happening is lots of empty values; the delimiter list isn't taken as any sort of order, just a list of possible separators. So if it sees ",", it thinks you have three delimiters with nothing inbetween → two empty values → NaN.

Related

How to skip first few rows when reading from a file in matlab [duplicate]

When I try to use headerlines with textscan to skip the first line of the text file, all of my data cells are stored as empty.
fid = fopen('RYGB.txt');
A = textscan(fid, '%s %s %s %f', 'HeaderLines', '1');
fclose(fid);
This code gives
1x4 Cell
[] [] [] []
Without the headerlines part and without a first line that needs to be skipped in the text file, the data is read in with no problem. It creates a 1x4 cell with data cells containing all of the information from the text file in columns.
What can I do to to skip the first line of the text file and read my data in normally?
Thanks
I think your problem is that you have specified a string instead of an integer value for HeaderLines. The character '1' is interpreted as its ASCII value, 0x31 (49 decimal), so the first 49 lines are skipped. Your file probably contains 49 lines or less, so everything ends up being discarded. This is why you're getting empty cells.
The solution is to replace '1' with 1 (i.e. remove the quotes), like so:
A = textscan(fid, '%s %s %s %f', 'HeaderLines', 1);
and this should do the trick.

Trying to load text file into Octave GUI into the correct format

I have a text file that contains 10 columns separated by comas and X number of rows denoted by return. The initial header line of the data is a string. The first two columns are character strings while the last 8 columns are integers. So far I have tried fscanf:
t1 = fscanf( test, '%c', inf );
which will import the data as large 1 by XXXXX character matrix and textread which imports it but does not format it correctly:
[a,b,c,d,e,f,g,h,i,j] = textread("test.txt", "%s %s %s %s %s %s %s %s %s %s",\
'headerlines', 1);
I suspect its a simple issue of formatting the notation on textread correctly to get my desired output. Any help is greatly appreciated.

Why can't the matlab textscan function read + 22.24 as a float?

I'm currently having a problem with the matlab function textscan.
I got a data file which looks like this:
1,2018/08/14 17:06:15, 0,+ 22.24,+ 22.46,+ 18.18,+0.0000,+0.0005,LLLLLLLLLL,LLLLLLLLLL,LLLL
or sometimes when a sensor isn't working properly it looks like this:
1,2018/07/11 17:02:53, 0,+ 23.88,+ 24.78,+ 23.65,+++++++,+ 23.94,+ 23.01,+ 24.33,LLLLLLLLLL,LLLLLLLLLL,LLLL
Since the data varies from file to file I am creating a matching formatSpec from the headerline.
In the 1st case it would look like
formatSpec = '%*u %s %*u%f%f%f%f%f%*[^\n]'
and in the 2nd case like
formatSpec = '%*u %s %*u%f%f%f%f%f%f%f%*[^\n]'
I am using the texscan function like this:
textscan(fileID, formatSpec_data, data_rows, 'Delimiter', ',', 'TreatAsEmpty', {'+++++++'},'EmptyValue', NaN, 'ReturnOnError', 0 );
but it keeps throwing an error on me with the message
Error using textscan
Mismatch between file and format character vector.
Trouble reading 'Numeric' field from file (row number 1, field number 4) ==> + 23.88,+ 24.78,+ 23.65,+++++++,+ 23.94,+ 23.01,+ 24.33,LLLLLLLLLL,LLLLLLLLLL,LLLL\n
Error in data_logger (line 31)
dataArray = textscan(fileID, formatSpec_data, data_rows, 'Delimiter', delimiter, 'HeaderLines' ,startRow, 'TreatAsEmpty', {'+++++++'},'EmptyValue', NaN, 'ReturnOnError', 0 );
When I deactivate 'returnOnError' then textscan reads only the first row and except the date/time string everything is just empty. I also tried to use textscan without TreatAsEmpty and / or EmptyValue but I get the same result.
I really don't get why textscan got problems to read e.g. ,+ 22.24 as a float.
When I specify formatSpec to read all the data as strings it works but then I have to use str2num afterwards which I don't really want to do.
I'm thankful for every help and looking forward to understand this behaviour.
Short answer: Matlab doesn't like the space between the + and the number in those fields. I think the simplest solution may be to just tell Matlab to ignore the + by calling it white space. Add the arguments 'WhiteSpace','+' when you call textscan, like this:
textscan(fileID, formatSpec_data, data_rows, 'Delimiter', ',', 'EmptyValue', NaN, 'ReturnOnError', 0 , 'WhiteSpace', '+');
Note that I also removed the 'TreatAsEmpty' argument, because once you consider all the + as white space, it is empty anyway.
Another option would be to pre-parse the file and remove the space between the + and the number. You could read the file using fileread, do a replacement using strrep or regexprep, then run textscan on the result.
datain = fileread('mydatafile.csv')
datain = strrep(datain,'+ ','+');
textscan(datain, formatSpec_data, data_rows, 'Delimiter', ',', 'TreatAsEmpty', {'+++++++'},'EmptyValue', NaN, 'ReturnOnError', 0 );
Finally, if you get stuck where you absolutely have to read as text then convert to numeric values, try str2doubleq, available on the Matlab File Exchange. It is much faster than str2double or str2num.

matlab: delimit .csv file where no specific delimiter is available

i wonder if there is the possibility to read a .csv file looking like:
0,0530,0560,0730,....
90,15090,15290,157....
i should get:
0,053 0,056 0,073 0,...
90,150 90,152 90,157 90,...
when using dlmread(path, '') matlab spits out an error saying
Mismatch between file and Format character vector.
Trouble reading 'Numeric' field frin file (row 1, field number 2) ==> ,053 0,056 0,073 ...
i also tried using "0," as the delimiter but matlab prohibits this.
Thanks,
jonnyx
str= importdata('file.csv',''); %importing the data as a cell array of char
for k=1:length(str) %looping till the last line
str{k}=myfunc(str{k}); %applying the required operation
end
where
function new=myfunc(str)
old = str(1:regexp(str, ',', 'once')); %finding the characters till the first comma
%old is the pattern of the current line
new=strrep(str,old,[' ',old]); %adding a space before that pattern
new=new(2:end); %removing the space at the start
end
and file.csv :
0,0530,0560,073
90,15090,15290,157
Output:
>> str
str=
'0,053 0,056 0,073'
'90,150 90,152 90,157'
You can actually do this using textscan without any loops and using a few basic string manipulation functions:
fid = fopen('no_delim.csv', 'r');
C = textscan(fid, ['%[0123456789' 10 13 ']%[,]%3c'], 'EndOfLine', '');
fclose(fid);
C = strcat(C{:});
output = strtrim(strsplit(sprintf('%s ', C{:}), {'\n' '\r'})).';
And the output using your sample input file:
output =
2×1 cell array
'0,053 0,056 0,073'
'90,150 90,152 90,157'
How it works...
The format string specifies 3 items to read repeatedly from the file:
A string containing any number of characters from 0 through 9, newlines (ASCII code 10), or carriage returns (ASCII code 13).
A comma.
Three individual characters.
Each set of 3 items are concatenated, then all sets are printed to a string separated by spaces. The string is split at any newlines or carriage returns to create a cell array of strings, and any spaces on the ends are removed.
If you have access to a GNU / *NIX command line, I would suggest using sed to preprocess your data before feeding into matlab. The command would be in this case : sed 's/,[0-9]\{3\}/& /g' .
$ echo "90,15090,15290,157" | sed 's/,[0-9]\{3\}/& /g'
90,150 90,152 90,157
$ echo "0,0530,0560,0730,356" | sed 's/,[0-9]\{3\}/& /g'
0,053 0,056 0,073 0,356
also, you easily change commas , to decimal point .
$ echo "0,053 0,056 0,073 0,356" | sed 's/,/./g'
0.053 0.056 0.073 0.356

Using readtable on .csv with junk text after last cell

I'm using readtable to read a .csv which contains a line of gunk at the bottom:
ColA, ColB, ColC,
42 , foo , 1.1
666 , bar , 2.2
SomeGunk, 101
(yup, first line has a trailing ,, but that doesn't seem to be an issue)
... which upsets readtable:
>> readtable(file)
Error using readtable (line 197)
Reading failed at line 4. All lines of a text file must
have the same number of delimiters. Line 4 has 2 delimiters,
while preceding lines have 3.
Note: readtable detected the following parameters:
'Delimiter', ',', 'HeaderLines', 1, 'ReadVariableNames', false, 'Format',
'%f%q%q%q%q%f%f%f%D%D%q%q%f%f%f%f%f%f%f%f%f'
What can I do?
Is there anything short of reading the file and writing it back out again minus the last line? This seems really clumsy. And if I must do this, what's the cleanest way?
The readtable function lets you manually define a comment symbol. From the documentation:
For example, specify a character such as '%' to ignore text following the symbol on the same line. Specify a cell array of two character vectors, such as {'/*', '*/'}, to ignore any text between those sequences.
That means, you can define 'someGunk' to be the comment symbol, i.e. any line starting with 'someGunk' will be ignored:
>> readtable('gunk.csv', 'Delimiter', ',', 'CommentStyle', 'SomeGunk')
ans =
2×3 table
Var1 Var2 Var3
____ ______ ____
42 'foo ' 1.1
666 'bar ' 2.2
This works only under the conditions that 1) the rubbish lines will always start with 'SomeGunk', 2) 'SomeGunk' does not appear anywhere else in the file, and 3) you don't need any other comment symbols.