matlab: delimit .csv file where no specific delimiter is available - matlab

i wonder if there is the possibility to read a .csv file looking like:
0,0530,0560,0730,....
90,15090,15290,157....
i should get:
0,053 0,056 0,073 0,...
90,150 90,152 90,157 90,...
when using dlmread(path, '') matlab spits out an error saying
Mismatch between file and Format character vector.
Trouble reading 'Numeric' field frin file (row 1, field number 2) ==> ,053 0,056 0,073 ...
i also tried using "0," as the delimiter but matlab prohibits this.
Thanks,
jonnyx

str= importdata('file.csv',''); %importing the data as a cell array of char
for k=1:length(str) %looping till the last line
str{k}=myfunc(str{k}); %applying the required operation
end
where
function new=myfunc(str)
old = str(1:regexp(str, ',', 'once')); %finding the characters till the first comma
%old is the pattern of the current line
new=strrep(str,old,[' ',old]); %adding a space before that pattern
new=new(2:end); %removing the space at the start
end
and file.csv :
0,0530,0560,073
90,15090,15290,157
Output:
>> str
str=
'0,053 0,056 0,073'
'90,150 90,152 90,157'

You can actually do this using textscan without any loops and using a few basic string manipulation functions:
fid = fopen('no_delim.csv', 'r');
C = textscan(fid, ['%[0123456789' 10 13 ']%[,]%3c'], 'EndOfLine', '');
fclose(fid);
C = strcat(C{:});
output = strtrim(strsplit(sprintf('%s ', C{:}), {'\n' '\r'})).';
And the output using your sample input file:
output =
2×1 cell array
'0,053 0,056 0,073'
'90,150 90,152 90,157'
How it works...
The format string specifies 3 items to read repeatedly from the file:
A string containing any number of characters from 0 through 9, newlines (ASCII code 10), or carriage returns (ASCII code 13).
A comma.
Three individual characters.
Each set of 3 items are concatenated, then all sets are printed to a string separated by spaces. The string is split at any newlines or carriage returns to create a cell array of strings, and any spaces on the ends are removed.

If you have access to a GNU / *NIX command line, I would suggest using sed to preprocess your data before feeding into matlab. The command would be in this case : sed 's/,[0-9]\{3\}/& /g' .
$ echo "90,15090,15290,157" | sed 's/,[0-9]\{3\}/& /g'
90,150 90,152 90,157
$ echo "0,0530,0560,0730,356" | sed 's/,[0-9]\{3\}/& /g'
0,053 0,056 0,073 0,356
also, you easily change commas , to decimal point .
$ echo "0,053 0,056 0,073 0,356" | sed 's/,/./g'
0.053 0.056 0.073 0.356

Related

Using readtable on .csv with junk text after last cell

I'm using readtable to read a .csv which contains a line of gunk at the bottom:
ColA, ColB, ColC,
42 , foo , 1.1
666 , bar , 2.2
SomeGunk, 101
(yup, first line has a trailing ,, but that doesn't seem to be an issue)
... which upsets readtable:
>> readtable(file)
Error using readtable (line 197)
Reading failed at line 4. All lines of a text file must
have the same number of delimiters. Line 4 has 2 delimiters,
while preceding lines have 3.
Note: readtable detected the following parameters:
'Delimiter', ',', 'HeaderLines', 1, 'ReadVariableNames', false, 'Format',
'%f%q%q%q%q%f%f%f%D%D%q%q%f%f%f%f%f%f%f%f%f'
What can I do?
Is there anything short of reading the file and writing it back out again minus the last line? This seems really clumsy. And if I must do this, what's the cleanest way?
The readtable function lets you manually define a comment symbol. From the documentation:
For example, specify a character such as '%' to ignore text following the symbol on the same line. Specify a cell array of two character vectors, such as {'/*', '*/'}, to ignore any text between those sequences.
That means, you can define 'someGunk' to be the comment symbol, i.e. any line starting with 'someGunk' will be ignored:
>> readtable('gunk.csv', 'Delimiter', ',', 'CommentStyle', 'SomeGunk')
ans =
2×3 table
Var1 Var2 Var3
____ ______ ____
42 'foo ' 1.1
666 'bar ' 2.2
This works only under the conditions that 1) the rubbish lines will always start with 'SomeGunk', 2) 'SomeGunk' does not appear anywhere else in the file, and 3) you don't need any other comment symbols.

How do I read comma separated values from a .txt file in MATLAB using textscan()?

I have a .txt file with rows consisting of three elements, a word and two numbers, separated by commas.
For example:
a,142,5
aa,3,0
abb,5,0
ability,3,0
about,2,0
I want to read the file and put the words in one variable, the first numbers in another, and the second numbers in another but I am having trouble with textscan.
This is what I have so far:
File = [LOCAL_DIR 'filetoread.txt'];
FID_File = fopen(File,'r');
[words,var1,var2] = textscan(File,'%s %f %f','Delimiter',',');
fclose(FID_File);
I can't seem to figure out how to use a delimiter with textscan.
horchler is indeed correct. You first need to open up the file with fopen which provides a file ID / pointer to the actual file. You'd then use this with textscan. Also, you really only need one output variable because each "column" will be placed as a separate column in a cell array once you use textscan. You also need to specify the delimiter to be the , character because that's what is being used to separate between columns. This is done by using the Delimiter option in textscan and you specify the , character as the delimiter character. You'd then close the file after you're done using fclose.
As such, you just do this:
File = [LOCAL_DIR 'filetoread.txt'];
f = fopen(File, 'r');
C = textscan(f, '%s%f%f', 'Delimiter', ',');
fclose(f);
Take note that the formatting string has no spaces because the delimiter flag will take care of that work. Don't add any spaces. C will contain a cell array of columns. Now if you want to split up the columns into separate variables, just access the right cells:
names = C{1};
num1 = C{2};
num2 = C{3};
These are what the variables look like now by putting the text you provided in your post to a file called filetoread.txt:
>> names
names =
'a'
'aa'
'abb'
'ability'
'about'
>> num1
num1 =
142
3
5
3
2
>> num2
num2 =
5
0
0
0
0
Take note that names is a cell array of names, so accessing the right name is done by simply doing n = names{ii}; where ii is the name you want to access. You'd access the values in the other two variables using the normal indexing notation (i.e. n = num1(ii); or n = num2(ii);).

Read file names from .txt file in MATLAB

I am attempting to read in multiple file names from a .txt file. Each file names has multiple spaces and ends in different file formats.
When I try this code
M = textread('playlist.m3u', '%s')
I get the results to be the first string in the first row followed by the next string after the space is the next row ect.
One of the file names in the text file is "C:\Users\user\Music\Pink Floyd\Wish You Were Here (Matersound Gold Limited Edition)\03 - Have a Cigar.flac"
'C:\Users\user\Music\Pink'
'Floyd\Wish'
'You'
'Were'
'Here'
'(Matersound'
'Gold'
'Limited'
'Edition)\03'
'-'
'Have'
'a'
'Cigar.flac'
How do I simply read in all the files with each file taking up 1 cell in an cell array?
Use textscan and specify newline \n as the delimiter:
fid = fopen('playlist.m3u');
M = textscan(fid, '%s', 'delimiter', '\n')

How to read digits from file to matrix, no delimeter

I have a data stored in below format, no delimeter and digit domain is {0,1}. With using octave, taking the digits and storing them in martix is reaised a problem for me. I have not managed below scnerio. So, How can I take those digits and store them on matrix as told at below?
Data in File, 32 x 32 digits
00000000000000000000000000000000
00000000001111110000000000000000
...
00000010000000100001000000000000
how to store data
matrix[1, 1:32] = 00000000000000000000000000000000
matrix[2, 1:32] = 00000000001111110000000000000000
. . .
matrix[32, 1:32] = 00000010000000100001000000000000
OR
matrix[1, 1:32] = 00000000000000000000000000000000
matrix[1, 33:64] = 00000000001111110000000000000000
. . .
matrix[1, 993:1024] = 00000010000000100001000000000000
One possible solution is to read the data as a string first:
octave> textread('foo.dat', '%s', 'headerlines', 2)
ans =
{
[1,1] = 00000000000000000000000000000000
[2,1] = 00000000001111110000000000000000
...
}
If these are binary representations of decimals, you may find bin2dec() useful.
This would do the trick (though I don't know how well that third input to fread and arrayfun work with Octave, tested this on Matlab):
fid = fopen('a.txt','rt');
str = fread(fid,inf,'char=>char');
st = fclose(fid);
qrn = str==10|str==13;
str(qrn) = [];
yourMat = reshape(arrayfun(#str2num,str),find(qrn,1)-1,[]).'
Assuming you don't have header lines, you can read the text in as a cell arrray of strings like so:
C = textread('names.txt', '%s');
Then, in general for all numbers from 0 to 9, you can transform this into a matrix like so:
M = vertcat(S{:})-'0';
If performance is an issue you can look into other ways to import the strings, but this should get the job done.
I have never used Matlab, but asuming it reads files the same way Octave does, and if using an external tool is OK, you could try replacing the characters to add a delimiter using a text editor. You could change every "0" to "0," and every "1" to "1," and then simply load the file.
(This would add a delimiter at the end of every line. In case that creates a problem, you could try replacing your text by pairs instead "00"->"0,0" "10" -> "1,0" and so on)
In case the file is too big for a normal editor, you might even try replacing the characters with sed:
sed -i 's/charactertoreplace/newcharacter/g' yourfile.txt

Nested textscan statements

The following two statements read the first line from an input file (fid) and parse said line into strings delimited by whitespace.
a = textscan(fid,'%s',1,'Delimiter','\n');
b = textscan(a{1}{1},'%s');
I would like to know if this action can be accomplished in a single statement, having a form similar to the following (which is syntactically invalid).
b = textscan(textscan(fid,'%s',1,'Delimiter','\n'),'%s');
Thanks.
Instead of
a = textscan(fid, '%s', 1, 'Delimiter', '\n');
you can use
a = fgetl(fid);
That will return the next line in fid as a string (the newline character at the end is stripped). You can then split that line into white-space separated chunks as follows:
b = regexp(a, '\s*', 'split');
Combined:
b = regexp(fgetl(fid), '\s*', 'split');
Note that this is not 100% equivalent to your code, since using textscan adds another cell-layer (representing different lines in the file). That's not a problem, though, simply use
b = {regexp(fgetl(fid), '\s*', 'split')};
if you need that extra cell-layer.