Using readtable on .csv with junk text after last cell - matlab

I'm using readtable to read a .csv which contains a line of gunk at the bottom:
ColA, ColB, ColC,
42 , foo , 1.1
666 , bar , 2.2
SomeGunk, 101
(yup, first line has a trailing ,, but that doesn't seem to be an issue)
... which upsets readtable:
>> readtable(file)
Error using readtable (line 197)
Reading failed at line 4. All lines of a text file must
have the same number of delimiters. Line 4 has 2 delimiters,
while preceding lines have 3.
Note: readtable detected the following parameters:
'Delimiter', ',', 'HeaderLines', 1, 'ReadVariableNames', false, 'Format',
'%f%q%q%q%q%f%f%f%D%D%q%q%f%f%f%f%f%f%f%f%f'
What can I do?
Is there anything short of reading the file and writing it back out again minus the last line? This seems really clumsy. And if I must do this, what's the cleanest way?

The readtable function lets you manually define a comment symbol. From the documentation:
For example, specify a character such as '%' to ignore text following the symbol on the same line. Specify a cell array of two character vectors, such as {'/*', '*/'}, to ignore any text between those sequences.
That means, you can define 'someGunk' to be the comment symbol, i.e. any line starting with 'someGunk' will be ignored:
>> readtable('gunk.csv', 'Delimiter', ',', 'CommentStyle', 'SomeGunk')
ans =
2×3 table
Var1 Var2 Var3
____ ______ ____
42 'foo ' 1.1
666 'bar ' 2.2
This works only under the conditions that 1) the rubbish lines will always start with 'SomeGunk', 2) 'SomeGunk' does not appear anywhere else in the file, and 3) you don't need any other comment symbols.

Related

Trying to load text file into Octave GUI into the correct format

I have a text file that contains 10 columns separated by comas and X number of rows denoted by return. The initial header line of the data is a string. The first two columns are character strings while the last 8 columns are integers. So far I have tried fscanf:
t1 = fscanf( test, '%c', inf );
which will import the data as large 1 by XXXXX character matrix and textread which imports it but does not format it correctly:
[a,b,c,d,e,f,g,h,i,j] = textread("test.txt", "%s %s %s %s %s %s %s %s %s %s",\
'headerlines', 1);
I suspect its a simple issue of formatting the notation on textread correctly to get my desired output. Any help is greatly appreciated.

MATLAB / Octave - how to parse CSV file with numbers and strings that contain commas

I have a CSV file that has 20 columns. Some of the columns have number values, others have text values, and the text ones may or may not contain commas.
CSV content example:
column1, column2, column3, column4
"text value 1", 123, "text, with a comma", 25
"another, comma", 456, "other text", 78
I'm using textscan function, but I'm getting the most buggy and weird behavior. With some arguments, it reads all the values in only one column, sometimgs it repeats columns, and most of the things I've tried lead to the commas being incorrectly interpreted as column separators (despite text being enclosed in double quotes). That is, I've tried specifying 'delimiter' argument, and also including literals in the format specification, to no avail.
What's the correct way of invoking textscan to deal with a CSV file as the example above? I'm looking for a solution that runs both on MATLAB and on Octave (or, if that's not possible, the equivalent solution in each one).
For GNU Octave, using io package
pkg load io
c = csv2cell ("jota.csv")
gives
c =
{
[1,1] = column1
[2,1] = text value 1
[3,1] = another, comma
[1,2] = column2
[2,2] = 123
[3,2] = 456
[1,3] = column3
[2,3] = text, with a comma
[3,3] = other text
[1,4] = column4
[2,4] = 25
[3,4] = 78
}
btw, you should explicitly mention if the solution should run on GNU Octave, Matlab or both
First, read the column headers using the format '%s' four times:
fileID = fopen(filename);
C_text = textscan(fileID,'%s', 4,'Delimiter',',');
Then use the conversion specifier, %q to read the text enclosed by double quotation marks ("):
C = textscan(fileID,'%q %d %q %d','Delimiter',',');
fclose(fileID);
(This works for reading your sample data on Octave. It should work on MATLAB, too.)
Edit: removed redundant fopen.

matlab: delimit .csv file where no specific delimiter is available

i wonder if there is the possibility to read a .csv file looking like:
0,0530,0560,0730,....
90,15090,15290,157....
i should get:
0,053 0,056 0,073 0,...
90,150 90,152 90,157 90,...
when using dlmread(path, '') matlab spits out an error saying
Mismatch between file and Format character vector.
Trouble reading 'Numeric' field frin file (row 1, field number 2) ==> ,053 0,056 0,073 ...
i also tried using "0," as the delimiter but matlab prohibits this.
Thanks,
jonnyx
str= importdata('file.csv',''); %importing the data as a cell array of char
for k=1:length(str) %looping till the last line
str{k}=myfunc(str{k}); %applying the required operation
end
where
function new=myfunc(str)
old = str(1:regexp(str, ',', 'once')); %finding the characters till the first comma
%old is the pattern of the current line
new=strrep(str,old,[' ',old]); %adding a space before that pattern
new=new(2:end); %removing the space at the start
end
and file.csv :
0,0530,0560,073
90,15090,15290,157
Output:
>> str
str=
'0,053 0,056 0,073'
'90,150 90,152 90,157'
You can actually do this using textscan without any loops and using a few basic string manipulation functions:
fid = fopen('no_delim.csv', 'r');
C = textscan(fid, ['%[0123456789' 10 13 ']%[,]%3c'], 'EndOfLine', '');
fclose(fid);
C = strcat(C{:});
output = strtrim(strsplit(sprintf('%s ', C{:}), {'\n' '\r'})).';
And the output using your sample input file:
output =
2×1 cell array
'0,053 0,056 0,073'
'90,150 90,152 90,157'
How it works...
The format string specifies 3 items to read repeatedly from the file:
A string containing any number of characters from 0 through 9, newlines (ASCII code 10), or carriage returns (ASCII code 13).
A comma.
Three individual characters.
Each set of 3 items are concatenated, then all sets are printed to a string separated by spaces. The string is split at any newlines or carriage returns to create a cell array of strings, and any spaces on the ends are removed.
If you have access to a GNU / *NIX command line, I would suggest using sed to preprocess your data before feeding into matlab. The command would be in this case : sed 's/,[0-9]\{3\}/& /g' .
$ echo "90,15090,15290,157" | sed 's/,[0-9]\{3\}/& /g'
90,150 90,152 90,157
$ echo "0,0530,0560,0730,356" | sed 's/,[0-9]\{3\}/& /g'
0,053 0,056 0,073 0,356
also, you easily change commas , to decimal point .
$ echo "0,053 0,056 0,073 0,356" | sed 's/,/./g'
0.053 0.056 0.073 0.356

How do I read comma separated values from a .txt file in MATLAB using textscan()?

I have a .txt file with rows consisting of three elements, a word and two numbers, separated by commas.
For example:
a,142,5
aa,3,0
abb,5,0
ability,3,0
about,2,0
I want to read the file and put the words in one variable, the first numbers in another, and the second numbers in another but I am having trouble with textscan.
This is what I have so far:
File = [LOCAL_DIR 'filetoread.txt'];
FID_File = fopen(File,'r');
[words,var1,var2] = textscan(File,'%s %f %f','Delimiter',',');
fclose(FID_File);
I can't seem to figure out how to use a delimiter with textscan.
horchler is indeed correct. You first need to open up the file with fopen which provides a file ID / pointer to the actual file. You'd then use this with textscan. Also, you really only need one output variable because each "column" will be placed as a separate column in a cell array once you use textscan. You also need to specify the delimiter to be the , character because that's what is being used to separate between columns. This is done by using the Delimiter option in textscan and you specify the , character as the delimiter character. You'd then close the file after you're done using fclose.
As such, you just do this:
File = [LOCAL_DIR 'filetoread.txt'];
f = fopen(File, 'r');
C = textscan(f, '%s%f%f', 'Delimiter', ',');
fclose(f);
Take note that the formatting string has no spaces because the delimiter flag will take care of that work. Don't add any spaces. C will contain a cell array of columns. Now if you want to split up the columns into separate variables, just access the right cells:
names = C{1};
num1 = C{2};
num2 = C{3};
These are what the variables look like now by putting the text you provided in your post to a file called filetoread.txt:
>> names
names =
'a'
'aa'
'abb'
'ability'
'about'
>> num1
num1 =
142
3
5
3
2
>> num2
num2 =
5
0
0
0
0
Take note that names is a cell array of names, so accessing the right name is done by simply doing n = names{ii}; where ii is the name you want to access. You'd access the values in the other two variables using the normal indexing notation (i.e. n = num1(ii); or n = num2(ii);).

trying to use "," delimiter in octave

I am trying to use the textscan function. Here is the data I am trying to read:
"0", "6/23/2015 12:21:59 PM", "93.161", "95.911","94.515","95.917", "-5511.105","94.324","-1415.849","2.376","2.479"
"1", "6/23/2015 12:22:02 PM", "97.514", "96.068","94.727","96.138","-12500.000","94.540","-8094.912","2.386","2.479"
The data logger I am using puts quotes around all values even though they are numbers. If they were separated by commas I could just use csvread. You can see some of my commented out failed attempts. Here is the code I have been trying:
fileID = fopen('test3.txt');
%C = textscan(fileID,'"%f%s%f%f%f%f%f%f%f%f%f"', 'delimiter', '","');
C = textscan(fileID,'"%f","%s","%f","%f","%f","%f","%f","%f","%f","%f","%f"');
%C = textscan(fileID,'%s', 'delimiter', '"');
%C = strread(fileID, "%s %s %f %f %f %f %f %f %f %f %f", ",");
fclose(fileID);
celldisp(C)
If i run line 3 I get:
C{1} =
NaN
NaN
94.324
NaN
... omitted lines here ...
NaN
99.546
NaN
If I run lines 4, 5, or 6, I get:
warning: strread: unable to parse text or file with given format string
warning: called from
strread at line 688 column 7
textscan at line 318 column 8
test2 at line 4 column 3
error: some elements undefined in return list
error: called from
textscan at line 318 column 8
test2 at line 4 column 3
You want the magic word. The magic word is not please here, it's multipledelimsasone.
Basically, you want both " and , to be treated as delimiter characters. textscan looks for any of the delimiter characters, not a given order, which is why '","' didn't do what you expected. Turning multipledelimsasone on makes textscan treat any combination of " and , as a single delimiter.
C = textscan(fileID,'%f%s%f%f%f%f%f%f%f%f%f', 'delimiter', '," ','multipledelimsasone',1);
Without this option on, what textscan thinks is happening is lots of empty values; the delimiter list isn't taken as any sort of order, just a list of possible separators. So if it sees ",", it thinks you have three delimiters with nothing inbetween → two empty values → NaN.