Escape characters in Matlab - matlab

I am reading a file using fileread() which returns me the entire file. Now I need to read line by line and convert them into process the data. Can I know how I would be able to detect the newline character in Matlab? I tried '\n' and '\r\n' and it doesn't work.
Thanks in advance

For special acharacters either use the char function with the character code (http://www.asciitable.com/) or sprintf (my preferred way for better readability.
For example you are looking for sprintf('\n') or sprintf('\r\n')
char(13) is carriage return \r
char(10) is new line \n

You can read the file line by line (see fgetl):
fid = fopen ( 'file', 'r' );
% Check that it opened okay
if fid ~= -1
while ( true )
line = fgetl ( fid );
% Check for end of file
if line == -1; break; end
%Do stuff with line;
end
fclose ( fid );
end

Related

Add additional string to printing

Hi I would to print a string while adding the dots to the end rather than reprinting the string every time before it prints out the string again and again. I want it to print but only adding the dots to the already printed out string.
reboot = '### rebooting the mmp';
display(reboot)
for i = 1 : 15
reboot = strcat(reboot,'.')
pause(1);
end
How would i do this?
Rather than printing out the entire string every time, you can just print out a new dot each time through the loop.
To make this work, you'll want to use fprintf to print the dot rather than disp since disp will automatically append a newline to the end and fprintf will not so all of the dots end up on the same line.
% Print the initial message without a trailing newline
fprintf('### rebooting the mmp');
% Print 5 dots all on the same line with a 1-second pause
for k = 1:5
fprintf('.')
pause(1)
end
% We DO want to print a newline after we're all done
fprintf('\n')
fprintf(reboot)
for i=1:15
fprintf('.')
pause(1)
end

Altering multiple text files using grep awk sed perl or something else

I have multiple text files named split01.txt, split02.txt etc... with the data in the format below: (This is what I have)
/tmp/audio_files/n000001.wav;
/tmp/audio_files/n000002.wav;
/tmp/audio_files/n000003.wav;
/tmp/audio_files/p000004.wav;
/tmp/audio_files/p000005.wav;
I would like to create another file with the data taken from the split01.txt, split02.txt etc... file in the format below: (this is the format I would like to see)
[playlist]
NumberOfEntries=5
File000001=n000001.wav
Title000001=n000001.wav
File000002=n000002.wav
Title000002=n000002.wav
File000003=n000003.wav
Title000003=n000003.wav
File000004=p000004.wav
Title000004=p000004.wav
File000005=p000005.wav
Title000005=p000005.wav
Version=2
Can this be done in one instance? The reason I ask is that I'm going to be running/calling the command (awk,grep,sed,etc...) from inside of octave/matlab after the initial process has completed creating the audio files.
example: of what I mean in one instance below: (matlab/octave code)
system(strcat({'split --lines=3600 -d '},dirpathwaveformstmp,fileallplaylistStr,{' '},dirpathwaveformstmp,'allsplit'))
This splits a single file into multiple files with the names allsplit01 allsplit02 etc.. and each file only has a max of 3600 lines.
For those who asked this is creating playlist files for audio files I create with octave/matlab.
Any suggestions?
Here's one way you could do it with awk:
parse.awk
BEGIN {
print "[playlist]"
print "NumberOfEntries=" len "\n"
i = 1
}
{
gsub(".*/|;", "")
printf "File%06d=%s\n" , i, $0
printf "Title%06d=%s\n\n", i, $0
i++
}
END {
print "Version 2"
}
Run it like this:
awk -v len=$(wc -l < infile) -f parse.awk infile
Output:
[playlist]
NumberOfEntries=5
File000001=n000001.wav
Title000001=n000001.wav
File000002=n000002.wav
Title000002=n000002.wav
File000003=n000003.wav
Title000003=n000003.wav
File000004=p000004.wav
Title000004=p000004.wav
File000005=p000005.wav
Title000005=p000005.wav
Version 2
If you're writing your program in Octave, why don't you do it in Octave as well? The language is not limited to numerical analysis. What you're trying to do can be done quite easily with Octave functions.
filepath = "path for input file"
playlistpath = "path for output file"
## read file and prepare cell array for printing
files = strsplit (fileread (filepath)', "\n");
if (isempty (files{end}))
files(end) = [];
endif
[~, names, exts] = cellfun (#fileparts, files, "UniformOutput", false);
files = strcat (names, exts);
files(2,:) = files(1,:);
files(4,:) = files(1,:);
files(1,:) = num2cell (1:columns(files))(:);
files(3,:) = num2cell (1:columns(files))(:);
## write playlist
[fid, msg] = fopen (playlistpath, "w");
if (fid < 0)
error ("Unable to fopen %s for writing: %s", playlistpath, msg);
endif
fprintf (fid, "[playlist]\n");
fprintf (fid, "NumberOfEntries=%i\n", columns (files));
fprintf (fid, "\n");
fprintf (fid, "File%06d=%s\nTitle%06d=%s\n\n", files{:});
fprintf (fid, "Version 2");
if (fclose (fid))
error ("Unable to fclose file %s with FID %i", playlistpath, fid);
endif

Reading comma in content of csv file , matlab

I have a csv file that contains comma in contents.
% with dot
15.12.2012 11:27; 0.9884753
11.12.2012 11:12; 10.670.642
11.12.2012 10:57; 114.455.145
Gdata= textscan(fid, '%s %f')
It works well.
% but what to do with dot
15.12.2012 11:27; 0,9884753
11.12.2012 11:12; 10,670.642
11.12.2012 10:57; 114,455.145
How can I read it.
regards,
This may solve possible unevennes due to the presence of both ',' and '.'
fid = fopen('data.d','r');
Gdata= textscan(fid, '%s %s','delimiter', ';' )
% // cancels '.' and sets ',' as '.'
f = #(i) str2double(regexprep(regexprep(i,'\.',''),',','\.'));
Num = cellfun(f,Gdata(2),'UniformOutput' , false);
Num{:}
ans =
0.9885
10.6706
114.4551
Unfortunately, textscan doesn't respect locale settings, so there's no way to make it interpret the comma as a decimal point by modifying the current locale. As a workaround, you could read the entire line in, replace the comma with a dot and then use textscan to parse the line.
line = fgetl( fid );
line = strrep( line, ',', '.' );
Gdata = textscan( line, '%s %f' );
You may have to resort to regexp or something else fancier than a simple strrep if the line may contain commas that you don't want replaced.

Write unicode strings to a file in Matlab

I have a string containing urdu characters like 'بجلی' this is a 1x4 array. I want to save this to a file, which would be viewed externally. Although this string doesnt display in the main Command Window, but variable 'str' does hold it. When I save this using fprintf(fid, str), and open that file in notepad there appear 'arrows' instead on the original characters. I can easily paste my characters into notepad manually. Where is the problem?
You need to use fwrite() not fprintf():
fid = fopen('temp.txt', 'w');
str = char([1576, 1580, 1604, 1740, 10]);
encoded_str = unicode2native(str, 'UTF-8');
fwrite(fid, encoded_str, 'uint8');
fclose(fid);
verified with:
perl -E "open my $fh, q{<:utf8}, q{temp.txt}; while (<$fh>) {while (m/(.)/g) {say ord $1}}"
1576
1580
1604
1740
It's not really necessary to avoid fprintf in order to write UTF-8 strings in a file. The idea is to open correctly the file:
f = fopen('temp.txt', 'w', 'native', 'UTF-8');
s = char([1576, 1580, 1604, 1740]);
fprintf(f, 'This is written as UTF-8: %s.\n', s);
fclose(f);
looking up every character in character map may seem hard. The code can be modified into the following code :
fid = fopen('temp.txt', 'w');
str = char(['س','ل','ا','م');
encoded_str = unicode2native(str, 'UTF-8');
fwrite(fid, encoded_str, 'uint8');
fclose(fid);
This seems to be easier but the only drawback is that it requires you to have Arabic/Persian/Urdo,... installed.

Renaming names in a file using another file without using loops

I have two files:
(one.txt) looks Like this:
>ENST001
(((....)))
(((...)))
>ENST002
(((((((.......))))))
((((...)))
I have like 10000 more ENST
(two.txt) looks like this:
>ENST001 110
>ENST002 59
and so on for the rest of all ENSTs
I basically would like to replace the ENSTs in the (one.txt) by the combination of the two fields in the (two.txt) so the results will look like this:
>ENST001_110
(((....)))
(((...)))
>ENST002_59
(((((((.......))))))
((((...)))
I wrote a matlab script to do so but since it loops for all lines in (two.txt) it take like 6 hours to finish, so I think using awk, sed, grep, or even perl we can get the result in few minutes. This is what I did in matlab:
frf = fopen('one.txt', 'r');
frp = fopen('two.txt', 'r');
fw = fopen('result.txt', 'w');
while feof(frf) == 0
line = fgetl(frf);
first_char = line(1);
if strcmp(first_char, '>') == 1 % if the line in one.txt start by > it is the ID
id_fold = strrep(line, '>', ''); % Reomve the > symbol
frewind(frp) % Rewind two.txt file after each loop
while feof(frp) == 0
raw = fgetl(frp);
scan = textscan(raw, '%s%s');
id_pos = scan{1}{1};
pos = scan{2}{1};
if strcmp(id_fold, id_pos) == 1 % if both ids are the same
id_new = ['>', id_fold, '_', pos];
fprintf(fw, '%s\n', id_new);
end
end
else
fprintf(fw, '%s\n', line); % if the line doesn't start by > print it to results
end
end
One way using awk. FNR == NR process first file in arguments and saves each number. Second condition process second file, and when first field matches with a key in the array modifies that line appending the number.
awk '
FNR == NR {
data[ $1 ] = $2;
next
}
FNR < NR && data[ $1 ] {
$0 = $1 "_" data[ $1 ]
}
{ print }
' two.txt one.txt
Output:
>ENST001_110
(((....)))
(((...)))
>ENST002_59
(((((((.......))))))
((((...)))
With sed you can at first run only on two.txt you can make a sed commands to replace as you want and run it at one.txt:
First way
sed "$(sed -n '/>ENST/{s=.*\(ENST[0-9]\+\)\s\+\([0-9]\+\).*=s/\1/\1_\2/;=;p}' two.txt)" one.txt
Second way
If files are huge you'll get too many arguments error with previous way. Therefore there is another way to fix this error. You need execute all three commands one by one:
sed -n '1i#!/bin/sed -f
/>ENST/{s=.*\(ENST[0-9]\+\)\s\+\([0-9]\+\).*=s/\1/\1_\2/;=;p}' two.txt > script.sed
chmod +x script.sed
./script.sed one.txt
The first command will form the sed script that will be able to modify one.txt as you want. chmod will make this new script executable. And the last command will execute command. So each file is read only once. There is no any loops.
Note that first command consist from two lines, but still is one command. If you'll delete newline character it will break the script. It is because of i command in sed. You can look for details in ``sed man page.
This Perl solution sends the modified one.txt file to STDOUT.
use strict;
use warnings;
open my $f2, '<', 'two.txt' or die $!;
my %ids;
while (<$f2>) {
$ids{$1} = "$1_$2" if /^>(\S+)\s+(\d+)/;
}
open my $f1, '<', 'one.txt' or die $!;
while (<$f1>) {
s/^>(\S+)\s*$/>$ids{$1}/;
print;
}
Turn the problem on its head. In perl I would do something like this:
#!/usr/bin/perl
open(FH1, "one.txt");
open(FH2, "two.txt");
open(RESULT, ">result.txt");
my %data;
while (my $line = <FH2>)
{
chomp(line);
# Delete leading angle bracket
$line =~ s/>//d;
# split enst and pos
my ($enst, $post) = split(/\s+/, line);
# Store POS with ENST as key
$data{$enst} = $pos;
}
close(FH2);
while (my $line = <FH1>)
{
# Check line for ENST
if ($line =~ m/^>(ENST\d+)/)
{
my $enst = $1;
# Get pos for ENST
my $pos = $data{$enst};
# make new line
$line = '>' . $enst . '_' . $pos . '\n';
}
print RESULT $line;
}
close(FH1);
close(RESULT);
This might work for you (GNU sed):
sed -n '/^$/!s|^\(\S*\)\s*\(\S*\).*|s/^\1.*/\1_\2/|p' two.txt | sed -f - one.txt
Try this MATLAB solution (no loops):
%# read files as cell array of lines
fid = fopen('one.txt','rt');
C = textscan(fid, '%s', 'Delimiter','\n');
C1 = C{1};
fclose(fid);
fid = fopen('two.txt','rt');
C = textscan(fid, '%s', 'Delimiter','\n');
C2 = C{1};
fclose(fid);
%# use regexp to extract ENST numbers from both files
num = regexp(C1, '>ENST(\d+)', 'tokens', 'once');
idx1 = find(~cellfun(#isempty, num)); %# location of >ENST line
val1 = str2double([num{:}]); %# ENST numbers
num = regexp(C2, '>ENST(\d+)', 'tokens', 'once');
idx2 = find(~cellfun(#isempty, num));
val2 = str2double([num{:}]);
%# construct new header lines from file2
C2(idx2) = regexprep(C2(idx2), ' +','_');
%# replace headers lines in file1 with the new headers
[tf,loc] = ismember(val2,val1);
C1( idx1(loc(tf)) ) = C2( idx2(tf) );
%# write result
fid = fopen('three.txt','wt');
fprintf(fid, '%s\n',C1{:});
fclose(fid);