Write unicode strings to a file in Matlab - matlab

I have a string containing urdu characters like 'بجلی' this is a 1x4 array. I want to save this to a file, which would be viewed externally. Although this string doesnt display in the main Command Window, but variable 'str' does hold it. When I save this using fprintf(fid, str), and open that file in notepad there appear 'arrows' instead on the original characters. I can easily paste my characters into notepad manually. Where is the problem?

You need to use fwrite() not fprintf():
fid = fopen('temp.txt', 'w');
str = char([1576, 1580, 1604, 1740, 10]);
encoded_str = unicode2native(str, 'UTF-8');
fwrite(fid, encoded_str, 'uint8');
fclose(fid);
verified with:
perl -E "open my $fh, q{<:utf8}, q{temp.txt}; while (<$fh>) {while (m/(.)/g) {say ord $1}}"
1576
1580
1604
1740

It's not really necessary to avoid fprintf in order to write UTF-8 strings in a file. The idea is to open correctly the file:
f = fopen('temp.txt', 'w', 'native', 'UTF-8');
s = char([1576, 1580, 1604, 1740]);
fprintf(f, 'This is written as UTF-8: %s.\n', s);
fclose(f);

looking up every character in character map may seem hard. The code can be modified into the following code :
fid = fopen('temp.txt', 'w');
str = char(['س','ل','ا','م');
encoded_str = unicode2native(str, 'UTF-8');
fwrite(fid, encoded_str, 'uint8');
fclose(fid);
This seems to be easier but the only drawback is that it requires you to have Arabic/Persian/Urdo,... installed.

Related

Escape characters in Matlab

I am reading a file using fileread() which returns me the entire file. Now I need to read line by line and convert them into process the data. Can I know how I would be able to detect the newline character in Matlab? I tried '\n' and '\r\n' and it doesn't work.
Thanks in advance
For special acharacters either use the char function with the character code (http://www.asciitable.com/) or sprintf (my preferred way for better readability.
For example you are looking for sprintf('\n') or sprintf('\r\n')
char(13) is carriage return \r
char(10) is new line \n
You can read the file line by line (see fgetl):
fid = fopen ( 'file', 'r' );
% Check that it opened okay
if fid ~= -1
while ( true )
line = fgetl ( fid );
% Check for end of file
if line == -1; break; end
%Do stuff with line;
end
fclose ( fid );
end

Add additional string to printing

Hi I would to print a string while adding the dots to the end rather than reprinting the string every time before it prints out the string again and again. I want it to print but only adding the dots to the already printed out string.
reboot = '### rebooting the mmp';
display(reboot)
for i = 1 : 15
reboot = strcat(reboot,'.')
pause(1);
end
How would i do this?
Rather than printing out the entire string every time, you can just print out a new dot each time through the loop.
To make this work, you'll want to use fprintf to print the dot rather than disp since disp will automatically append a newline to the end and fprintf will not so all of the dots end up on the same line.
% Print the initial message without a trailing newline
fprintf('### rebooting the mmp');
% Print 5 dots all on the same line with a 1-second pause
for k = 1:5
fprintf('.')
pause(1)
end
% We DO want to print a newline after we're all done
fprintf('\n')
fprintf(reboot)
for i=1:15
fprintf('.')
pause(1)
end

Altering multiple text files using grep awk sed perl or something else

I have multiple text files named split01.txt, split02.txt etc... with the data in the format below: (This is what I have)
/tmp/audio_files/n000001.wav;
/tmp/audio_files/n000002.wav;
/tmp/audio_files/n000003.wav;
/tmp/audio_files/p000004.wav;
/tmp/audio_files/p000005.wav;
I would like to create another file with the data taken from the split01.txt, split02.txt etc... file in the format below: (this is the format I would like to see)
[playlist]
NumberOfEntries=5
File000001=n000001.wav
Title000001=n000001.wav
File000002=n000002.wav
Title000002=n000002.wav
File000003=n000003.wav
Title000003=n000003.wav
File000004=p000004.wav
Title000004=p000004.wav
File000005=p000005.wav
Title000005=p000005.wav
Version=2
Can this be done in one instance? The reason I ask is that I'm going to be running/calling the command (awk,grep,sed,etc...) from inside of octave/matlab after the initial process has completed creating the audio files.
example: of what I mean in one instance below: (matlab/octave code)
system(strcat({'split --lines=3600 -d '},dirpathwaveformstmp,fileallplaylistStr,{' '},dirpathwaveformstmp,'allsplit'))
This splits a single file into multiple files with the names allsplit01 allsplit02 etc.. and each file only has a max of 3600 lines.
For those who asked this is creating playlist files for audio files I create with octave/matlab.
Any suggestions?
Here's one way you could do it with awk:
parse.awk
BEGIN {
print "[playlist]"
print "NumberOfEntries=" len "\n"
i = 1
}
{
gsub(".*/|;", "")
printf "File%06d=%s\n" , i, $0
printf "Title%06d=%s\n\n", i, $0
i++
}
END {
print "Version 2"
}
Run it like this:
awk -v len=$(wc -l < infile) -f parse.awk infile
Output:
[playlist]
NumberOfEntries=5
File000001=n000001.wav
Title000001=n000001.wav
File000002=n000002.wav
Title000002=n000002.wav
File000003=n000003.wav
Title000003=n000003.wav
File000004=p000004.wav
Title000004=p000004.wav
File000005=p000005.wav
Title000005=p000005.wav
Version 2
If you're writing your program in Octave, why don't you do it in Octave as well? The language is not limited to numerical analysis. What you're trying to do can be done quite easily with Octave functions.
filepath = "path for input file"
playlistpath = "path for output file"
## read file and prepare cell array for printing
files = strsplit (fileread (filepath)', "\n");
if (isempty (files{end}))
files(end) = [];
endif
[~, names, exts] = cellfun (#fileparts, files, "UniformOutput", false);
files = strcat (names, exts);
files(2,:) = files(1,:);
files(4,:) = files(1,:);
files(1,:) = num2cell (1:columns(files))(:);
files(3,:) = num2cell (1:columns(files))(:);
## write playlist
[fid, msg] = fopen (playlistpath, "w");
if (fid < 0)
error ("Unable to fopen %s for writing: %s", playlistpath, msg);
endif
fprintf (fid, "[playlist]\n");
fprintf (fid, "NumberOfEntries=%i\n", columns (files));
fprintf (fid, "\n");
fprintf (fid, "File%06d=%s\nTitle%06d=%s\n\n", files{:});
fprintf (fid, "Version 2");
if (fclose (fid))
error ("Unable to fclose file %s with FID %i", playlistpath, fid);
endif

Reading comma in content of csv file , matlab

I have a csv file that contains comma in contents.
% with dot
15.12.2012 11:27; 0.9884753
11.12.2012 11:12; 10.670.642
11.12.2012 10:57; 114.455.145
Gdata= textscan(fid, '%s %f')
It works well.
% but what to do with dot
15.12.2012 11:27; 0,9884753
11.12.2012 11:12; 10,670.642
11.12.2012 10:57; 114,455.145
How can I read it.
regards,
This may solve possible unevennes due to the presence of both ',' and '.'
fid = fopen('data.d','r');
Gdata= textscan(fid, '%s %s','delimiter', ';' )
% // cancels '.' and sets ',' as '.'
f = #(i) str2double(regexprep(regexprep(i,'\.',''),',','\.'));
Num = cellfun(f,Gdata(2),'UniformOutput' , false);
Num{:}
ans =
0.9885
10.6706
114.4551
Unfortunately, textscan doesn't respect locale settings, so there's no way to make it interpret the comma as a decimal point by modifying the current locale. As a workaround, you could read the entire line in, replace the comma with a dot and then use textscan to parse the line.
line = fgetl( fid );
line = strrep( line, ',', '.' );
Gdata = textscan( line, '%s %f' );
You may have to resort to regexp or something else fancier than a simple strrep if the line may contain commas that you don't want replaced.

Matlab: Remove chars from string with unicode chars

I have a long string that looks like:
その他,-9999.00
その他,-9999.00
その他,-9999.00
その他,-9999.00
and so forth. I'd like to split at linebreak and remove everything up to a comma, and just keep the floats. So my output should be something like:
A =
[-9999.99 -9999.99 -9999.99 -9999.99]
Any idea how to do that relatively quickly (a few seconds at most)? There are close to a million lines in that string.
Thanks!
I think the best way to do this is with textscan:
out = textscan(str, '%*s%f', 'delimiter', ',');
out = out{1};
I'm assuming the input is in a file. And I'm also assuming that the file is UTF-8 encoded, otherwise this won't work.
My solution is a simple Perl script. No doubt it can be done with MATLAB, but different tools have different strengths. I wouldn't attempt numerical analysis with Perl, that's for sure.
convert.pl
print "A = \n [ ";
while (<>) {
chomp;
s/.*,//;
print " ";
print;
}
print " ]";
input.txt
その他,-9999.00
その他,-9999.00
その他,-9999.00
その他,-9999.00
Command line
perl convert.pl < input.txt > output.txt
output.txt
A =
[ -9999.00 -9999.00 -9999.00 -9999.00 ]
Partial answer since I don't have access to matlab from home
The following can be used to split on tab. Use this to split on newline.
s=sprintf('one\ttwo three\tfour');
r=regexp(s,'\t','split')
% r = 'one' 'two three' 'four'
help strtok might be helpful as well
Here's how to use regexp with Matlab for your problem (with str containing your string):
out = regexp(str,[',([^,',char(10),']+)',char(10)],'tokens')
out = cat(1,out{:});
str2double(out)
out =
-9999
-9999
-9999
-9999
One simple way to extract the numeric parts and convert them to doubles is to use the functions ISMEMBER and STR2NUM:
A = str2num(str(ismember(str,',.e-0123456789')));