perl Curses and unicode: why addstr prints fine whereas addstring prints garbage? - perl

addstr — code, output:
use Curses;
initscr;
addstr 0, 0, 'Ж 会 र';
addstr 1, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getch;
endwin;
Ж 会 र
Curses 1.43, perl v5.36.0, OS: openbsd
addstring — code, output:
use Curses;
initscr;
addstring 0, 0, 'Ж 会 र';
addstring 1, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getchar;
endwin;
Ð~V ä¼~Z र
Curses 1.43, perl v5.36.0, OS: openbsd
Why is this behavior observed?
Shouldn't it be vice versa, since addstr is legacy whereas addstring is meant to be supportive of unicode?
https://metacpan.org/pod/Curses#Wide-Character-Aware-Functions
https://metacpan.org/pod/Curses#Available-Wide-Character-Aware-Functions
Update:
Wider example, with unicode string:
hardcoded,
taken from a variable
passed as a CLI argument
read from a file via backticks
read from a file via open
We need a file with unicode string:
echo -n 'Ж 会 र' > unicode.string.txt
Case 1: addstr, no additional declarations:
use Curses;
my $unicode_string_variable = 'Ж 会 र';
my $unicode_string_argv = $ARGV[0];
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;
# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
print $hardcoded_handle 'Ж 会 र';
close $hardcoded_handle;
open my $variable_handle, '>', 'unicode.string.variable' || die;
print $variable_handle $unicode_string_variable;
close $variable_handle;
open my $argv_handle, '>', 'unicode.string.argv' || die;
print $argv_handle $unicode_string_argv;
close $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
print $backticks_handle $unicode_string_backticks;
close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
print $open_pipe_handle $unicode_string_open_pipe;
close $open_pipe_handle;
# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;
initscr;
# print unicode to Curses
addstr 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstr 1, 0, 'variable : ' . $unicode_string_variable;
addstr 2, 0, 'argv : ' . $unicode_string_argv;
addstr 3, 0, 'backticks: ' . $unicode_string_backticks;
addstr 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;
addstr 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getchar;
endwin;
run:
perl curses-unicode.addstr.pl 'Ж 会 र'
Curses output, all-working unicode:
hardcoded: Ж 会 र
variable : Ж 会 र
argv : Ж 会 र
backticks: Ж 会 र
open_pipe: Ж 会 र
Curses 1.43, perl v5.36.0, OS: openbsd
STDOUT output, all-working unicode:
hardcoded: Ж 会 र
variable : Ж 会 र
argv : Ж 会 र
backticks: Ж 会 र
open_pipe: Ж 会 र
Files output, all-working unicode:
cat unicode.string.*
Ж 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 र
Case 2: addstring, no additional declarations:
use Curses;
my $unicode_string_variable = 'Ж 会 र';
my $unicode_string_argv = $ARGV[0];
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;
# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
print $hardcoded_handle 'Ж 会 र';
close $hardcoded_handle;
open my $variable_handle, '>', 'unicode.string.variable' || die;
print $variable_handle $unicode_string_variable;
close $variable_handle;
open my $argv_handle, '>', 'unicode.string.argv' || die;
print $argv_handle $unicode_string_argv;
close $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
print $backticks_handle $unicode_string_backticks;
close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
print $open_pipe_handle $unicode_string_open_pipe;
close $open_pipe_handle;
# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;
initscr;
# print unicode to Curses
addstring 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstring 1, 0, 'variable : ' . $unicode_string_variable;
addstring 2, 0, 'argv : ' . $unicode_string_argv;
addstring 3, 0, 'backticks: ' . $unicode_string_backticks;
addstring 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;
addstring 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getchar;
endwin;
run:
perl curses-unicode.addstring.pl 'Ж 会 र'
Curses output, all-broken unicode::
hardcoded: Ð~V ä¼~Z र
variable : Ð~V ä¼~Z र
argv : Ð~V ä¼~Z र
backticks: Ð~V ä¼~Z र
open_pipe: Ð~V ä¼~Z र
Curses 1.43, perl v5.36.0, OS: openbsd
STDOUT output, all-working unicode::
hardcoded: Ж 会 र
variable : Ж 会 र
argv : Ж 会 र
backticks: Ж 会 र
open_pipe: Ж 会 र
Files output, all-working unicode:
cat unicode.string.*
Ж 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 र
Case 3: addstring, additional declarations use utf8, -CA and :encoding(UTF-8):
use utf8;
use Curses;
my $unicode_string_variable = 'Ж 会 र';
my $unicode_string_argv = $ARGV[0];
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|:encoding(UTF-8)', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;
# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
print $hardcoded_handle 'Ж 会 र';
close $hardcoded_handle;
open my $variable_handle, '>', 'unicode.string.variable' || die;
print $variable_handle $unicode_string_variable;
close $variable_handle;
open my $argv_handle, '>', 'unicode.string.argv' || die;
print $argv_handle $unicode_string_argv;
close $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
print $backticks_handle $unicode_string_backticks;
close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
print $open_pipe_handle $unicode_string_open_pipe;
close $open_pipe_handle;
# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;
initscr;
# print unicode to Curses
addstring 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstring 1, 0, 'variable : ' . $unicode_string_variable;
addstring 2, 0, 'argv : ' . $unicode_string_argv;
addstring 3, 0, 'backticks: ' . $unicode_string_backticks;
addstring 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;
addstring 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getchar;
endwin;
run:
perl -CA curses-unicode.addstring.utf8,CA,encodingUTF8.pl 'Ж 会 र'
Curses output, partially-working, partially-broken unicode::
hardcoded: Ж 会 र
variable : Ж 会 र
argv : Ж 会 र
backticks: Ð~V ä¼~Z र
open_pipe: Ж 会 र
Curses 1.43, perl v5.36.0, OS: openbsd
STDOUT&STDERR output, all-working unicode:
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 12, <$open_pipe_read_handle> line 1.
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 15, <$open_pipe_read_handle> line 1.
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 18, <$open_pipe_read_handle> line 1.
Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 24, <$open_pipe_read_handle> line 1.
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 28, <$open_pipe_read_handle> line 1.
hardcoded: Ж 会 र
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 29, <$open_pipe_read_handle> line 1.
variable : Ж 会 र
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 30, <$open_pipe_read_handle> line 1.
argv : Ж 会 र
backticks: Ж 会 र
Wide character in printf at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line 32, <$open_pipe_read_handle> line 1.
open_pipe: Ж 会 र
Files output, all-working unicode:
cat unicode.string.*
Ж 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 रЖ 会 र
Why unicode just works for STDOUT and writing to files in all 3 cases without any hassle, whereas Curses balks? What is so special to Curses? Isn't it a bug of some kind in Curses given that with STDOUT and files all OK?
Is there a single place to enable unicode or need you to specify separately; where is uniformity; why?:
use utf8 for unicode in the source;
-CA for cli arguments;
:encoding(UTF-8) for open
How to fix unicode for backticks?
What are Wide character in print at curses-unicode.addstring.utf8,CA,encodingUTF8.pl line ..., <$open_pipe_read_handle> line 1. on STDERR and how to rid of these?

You need the use utf8; pragma:
use utf8;
use Curses;
initscr;
addstring 0, 0, 'Ж 会 र';
addstring 1, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getch;
endwin;
Output:
Ж 会 र
Curses 1.43, perl v5.34.0, OS: linux
See the Perl Unicode FAQ. Why the addstr version does work is probably a matter of luck (on my system only the third character is correctly displayed).
If you want to handle command line arguments from $ARGV as utf8 then you need a different approach. One way is to call Perl explicitly with the -C flag set to A or 32 (this is a special setting that controls the $ARGV encoding) or equivalently by setting the PERL_UNICODE environment variable in the terminal to A.
Alternatively you can re-encode $ARGV from within the code:
use Encode qw(decode_utf8);
#ARGV = map { decode_utf8($_, 1) } #ARGV;
In this case you don't need the command line flag.
This alternative also works for backticks substitution:
use Encode qw(decode_utf8);
my $unicode_string_backticks = decode_utf8(`cat unicode.string.txt`, 1);
Source: https://www.perl.com/pub/2012/04/perlunicookbook-decode-argv-as-utf8.html/
There is however a simpler solution that sets utf8 for hardcoded strings, argv, filehandles, printf, backticks etc. simultaneously, which is the utf8::all module. With this you don't need command line flags or the Encode module.- Because it targets STDOUT the warnings about wide characters are resolved as well.
use utf8::all;
use Curses;
my $unicode_string_variable = 'Ж 会 र';
my $unicode_string_argv = $ARGV[0];
#my $unicode_string_backticks = decode_utf8(`cat unicode.string.txt`,1);
my $unicode_string_backticks = `cat unicode.string.txt`;
open my $open_pipe_read_handle, '-|:encoding(UTF-8)', 'cat', 'unicode.string.txt' || die;
my $unicode_string_open_pipe = <$open_pipe_read_handle>;
# print unicode to files
open my $hardcoded_handle, '>', 'unicode.string.hardcoded' || die;
print $hardcoded_handle 'Ж 会 र';
close $hardcoded_handle;
open my $variable_handle, '>', 'unicode.string.variable' || die;
print $variable_handle $unicode_string_variable;
close $variable_handle;
open my $argv_handle, '>', 'unicode.string.argv' || die;
print $argv_handle $unicode_string_argv;
close $argv_handle;
open my $backticks_handle, '>', 'unicode.string.backticks' || die;
print $backticks_handle $unicode_string_backticks;
close $backticks_handle;
open my $open_pipe_handle, '>', 'unicode.string.open_pipe' || die;
print $open_pipe_handle $unicode_string_open_pipe;
close $open_pipe_handle;
# print unicode to STDOUT
printf "%s: %s\n", 'hardcoded', 'Ж 会 र';
printf "%s: %s\n", 'variable ', $unicode_string_variable;
printf "%s: %s\n", 'argv ', $unicode_string_argv;
printf "%s: %s\n", 'backticks', $unicode_string_backticks;
printf "%s: %s\n", 'open_pipe', $unicode_string_open_pipe;
initscr;
# print unicode to Curses
addstring 0, 0, 'hardcoded: ' . 'Ж 会 र';
addstring 1, 0, 'variable : ' . $unicode_string_variable;
addstring 2, 0, 'argv : ' . $unicode_string_argv;
addstring 3, 0, 'backticks: ' . $unicode_string_backticks;
addstring 4, 0, 'open_pipe: ' . $unicode_string_open_pipe;
addstring 5, 0, 'Curses ' . Curses->VERSION . ", perl $^V" . ", OS: $^O";
getchar;
endwin;
Source: https://blog.ostermiller.org/perl-wide-character-in-print/
If for whatever reason you can't or don't want to install this module then use utf8; together with the command line flags -CSDA also resolves all issues. Note that with these command line flags you should not use decode_utf8() in your code.

Related

How to log the file using Log::Log4perl module in perl

I am beginner in perl .I write the script for adding two number using
Getopt::Long module .I am facing difficulty to log the below script using Log::Log4perl module. Can anyone please help me with that.
use strict;
use warnings;
use Getopt::Long;
use Log::Log4perl;
my $num1=<>;
my $num2=<>;
chomp($num1,$num2);
my $res=GetOptions("numone=i"=>\$num1,
"numtwo=i"=>\$num2);
my $add=$num1 + $num2;
print $add;
Here is an example:
use feature qw(say);
use strict;
use warnings;
use Getopt::Long;
use Log::Log4perl qw(:easy);
my $conf = do { local $/; <DATA> };
Log::Log4perl::init( \$conf );
my $num1;
my $num2;
if (#ARGV == 0) {
print "Enter number1: ";
chomp($num1 = <>);
INFO( "Read number1 from STDIN: '$num1'" );
print "Enter number2: ";
chomp($num2=<>);
INFO( "Read number2 from STDIN: '$num2'" );
}
else {
GetOptions("numone=i" =>\$num1, "numtwo=i"=>\$num2)
or die "Bad command line options\n";
die "Number1 undefined\n" if !defined $num1;
die "Number2 undefined\n" if !defined $num2;
INFO( "Read number1 from ARGV: '$num1'" );
INFO( "Read number2 from ARGV: '$num2'" );
}
my $add = $num1 + $num2;
say "Result: $num1 + $num2 = $add";
__DATA__
log4perl.rootLogger=INFO, LOGFILE
log4perl.appender.LOGFILE=Log::Log4perl::Appender::File
log4perl.appender.LOGFILE.filename=mylog.log
log4perl.appender.LOGFILE.mode=append
log4perl.appender.LOGFILE.layout=PatternLayout
log4perl.appender.LOGFILE.layout.ConversionPattern=[%r] %F %L %c - %m%n
Example session:
$ p.pl --numone=3 --numtwo=4
Result: 3 + 4 = 7
$ cat mylog.log
[0] ./p.pl 26 main - Read number1 from ARGV: '3'
[0] ./p.pl 27 main - Read number2 from ARGV: '4'

"Turn Off" binmode(STDOUT, ":utf8") Locally

I Have The following block in the beginning of my script:
#!/usr/bin/perl5 -w
use strict;
binmode(STDIN, ":utf8");
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");
In some subroutines when there is other encoding(from a distant subroutine), the data will not display correctly, when receiving cyrillic or other characters. It is the "binmode", that causes the problem.
Can I "turn off" the binmode utf8 locally, for the subroutine only?
I can't remove the global binmode setting and I can't change the distant encoding.
One way to achieve this is to "dup" the STD handle, set the duplicated filehandle to use the :raw layer, and assign it to a local version of the STD handle. For example, the following code
binmode(STDOUT, ':utf8');
print(join(', ', PerlIO::get_layers(STDOUT)), "\n");
{
open(my $duped, '>&', STDOUT);
# The ':raw' argument could also be omitted.
binmode($duped, ':raw');
local *STDOUT = $duped;
print(join(', ', PerlIO::get_layers(STDOUT)), "\n");
close($duped);
}
print(join(', ', PerlIO::get_layers(STDOUT)), "\n");
prints
unix, perlio, utf8
unix, perlio
unix, perlio, utf8
on my system.
I like #nwellnhof's approach. Dealing only with Unicode and ASCII - a luxury few enjoy - my instinct would be to leave the bytes as is and selectively make use of Encode to decode()/encode() when needed. If you are able to determine which of your data sources are problematic you could filter/insert decode when dealing with them.
% file koi8r.txt
koi8r.txt: ISO-8859 text
% cat koi8r.txt
������ �� ����� � ������� ���. ���
���� ����� ������ ����� �����.
% perl -CO -MEncode="encode,decode" -E 'say decode("koi8-r", <>) ;' koi8r.txt
Американские суда находятся в международных водах. Япония
You could use something like Scope::Guard - lexically-scoped resource management to ensure it gets set back to :utf8 when you leave the scope, regardless of how (return, die, whatever):
#!/usr/bin/perl -w
use strict;
use Scope::Guard qw(guard);
binmode(STDOUT, ':utf8');
print(join(', ', PerlIO::get_layers(STDOUT)), "\n");
{
# When guard goes out of scope, this sub is guaranteed to be called:
my $guard = guard {
binmode(STDOUT, ':utf8');
};
binmode(STDOUT, ':raw');
print(join(', ', PerlIO::get_layers(STDOUT)), "\n");
}
print(join(', ', PerlIO::get_layers(STDOUT)), "\n");
Or, if you don't want to include a new dependency like Scope::Guard (Scope::Guard is awesome for this kind of localizing...):
#!/usr/bin/perl -w
use strict;
binmode(STDOUT, ':utf8');
print(join(', ', PerlIO::get_layers(STDOUT)), "\n");
{
my $guard = PoorMansGuard->new(sub {
binmode(STDOUT, ':utf8');
});
binmode(STDOUT, ':raw');
print(join(', ', PerlIO::get_layers(STDOUT)), "\n");
}
print(join(', ', PerlIO::get_layers(STDOUT)), "\n");
package PoorMansGuard;
sub new {
my ($class, $sub) = #_;
bless { sub => $sub }, $class;
}
sub DESTROY {
my ($self) = #_;
$self->{sub}->();
}

Perl - File Encoding and Word Comparison

I have a file with one phrase/terms each line which i read to perl from STDIN. I have a list of stopwords (like "á", "são", "é") and i want to compare each one of them with each term, and remove if they are equal. The problem is that i'm not certain of the file's encoding format.
I get this from the file command:
words.txt: Non-ISO extended-ASCII English text
My linux terminal is in UTF-8 and it shows the right content for some words and for others don't. Here is the output from some of them:
condi<E3>
conte<FA>dos
ajuda, mas não resolve
mo<E7>ambique
pedagógico são fenómenos
You can see that the 3rd and 5th lines are correctly identifying words with accents and special characters while others don't. The correct output for the other lines should be: condiã, conteúdos and moçambique.
If i use binmode(STDOUT, utf8) the "incorrect" lines now output correctly while the other ones don't. For example the 3rd line:
ajuda, mas não resolve
What should i do guys?
I strongly suggest you create a filter that takes a file with lines in mixed encodings and translates them to pure UTF-8. Then instead
open(INPUT, "< badstuff.txt") || die "open failed: $!";
you would open either the fixed version, or a pipe from the fixer, like:
open(INPUT, "fixit < badstuff.txt |") || die "open failed: $!"
In either event, you would then
binmode(INPUT, ":encoding(UTF-8)") || die "binmode failed";
Then the fixit program could just do this:
use strict;
use warnings;
use Encode qw(decode FB_CROAK);
binmode(STDIN, ":raw") || die "can't binmode STDIN";
binmode(STDOUT, ":utf8") || die "can't binmode STDOUT";
while (my $line = <STDIN>) {
$line = eval { decode("UTF-8", $line, FB_CROAK() };
if ($#) {
$line = decode("CP1252", $line, FB_CROAK()); # no eval{}!
}
$line =~ s/\R\z/\n/; # fix raw mode reads
print STDOUT $line;
}
close(STDIN) || die "can't close STDIN: $!";
close(STDOUT) || die "can't close STDOUT: $!";
exit 0;
See how that works? Of course, you could change it to default to some other encoding, or have multiple fall backs. Probably it would be best to take a list of them in #ARGV.
It works like this:
C:\Dev\Perl :: chcp
Aktive Codepage: 1252.
C:\Dev\Perl :: type mixed-encoding.txt
eins zwei drei Käse vier fünf Wurst
eins zwei drei Käse vier fünf Wurst
C:\Dev\Perl :: perl mixed-encoding.pl < mixed-encoding.txt
eins zwei drei vier fünf
eins zwei drei vier fünf
Where mixed-encoding.pl goes like this:
use strict;
use warnings;
use utf8; # source in UTF-8
use Encode 'decode_utf8';
use List::MoreUtils 'any';
my #stopwords = qw( Käse Wurst );
while ( <> ) { # read octets
chomp;
my #tokens;
for ( split /\s+/ ) {
# Try UTF-8 first. If that fails, assume legacy Latin-1.
my $token = eval { decode_utf8 $_, Encode::FB_CROAK };
$token = $_ if $#;
push #tokens, $token unless any { $token eq $_ } #stopwords;
}
print "#tokens\n";
}
Note that the script doesn't have to be encoded in UTF-8. It's just that if you have funky character data in your script you have to make sure the encoding matches, so use utf8 if your encoding is UTF-8, and don't if it isn't.
Update based on tchrist's sound advice:
use strict;
use warnings;
# source in Latin1
use Encode 'decode';
use List::MoreUtils 'any';
my #stopwords = qw( Käse Wurst );
while ( <> ) { # read octets
chomp;
my #tokens;
for ( split /\s+/ ) {
# Try UTF-8 first. If that fails, assume 8-bit encoding.
my $token = eval { decode utf8 => $_, Encode::FB_CROAK };
$token = decode Windows1252 => $_, Encode::FB_CROAK if $#;
push #tokens, uc $token unless any { $token eq $_ } #stopwords;
}
print "#tokens\n";
}

How can I use the range operator '..' to create a utf-8 alphabet?

Is there a way to create a UTF-8 alphabet array using the Perl '..' operator?
For example, this one won't work:
$ cat t.pl
#!/usr/bin/perl
use Data::Dumper;
use encoding 'utf8';
print Dumper('А'..'Я'); # not working!
print Dumper('А','Б','В'); # ...works fine! but needs to be filling letter by letter
$ perl t.pl
$VAR1 = "\x{410}";
$VAR1 = "\x{410}";
$VAR2 = "\x{411}";
$VAR3 = "\x{412}";
$ echo $LANG
en_US.UTF-8
Any advice?
This is mentioned - briefly - in the range operator docs.
You need to use the ord and chr functions:
#!/usr/bin/perl
use Data::Dumper;
use encoding 'utf8';
my #arry = map { chr } ord( 'А' ) .. ord( 'Я' );
for my $letter ( #arry ) {
print "$letter ";
}
print "\n";
Output:
А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я
The result you see arises because the initial value of the range isn't part of a 'magical' sequence (a non-empty string matching /^[a-zA-Z]*[0-9]*\z/), so the operator just returns that initial value.

How do I fix a multi-line runaway string error in Perl?

There are some errors in my Perl script, I looked though the source code but couldn't find the problem.
#Tool: decoding shell codes/making shell codes
use strict;
use Getopt::Std;
my %opts=();
getopts("f:xa", \%opts);
my($infile, $hex);
my($gen_hex, $gen_ascii);
sub usage() {
print "$0 -f <file> [-x | -a] \n\t";
print '-p <path to input file>'."\n\t";
print '-x convert "\nxXX" hex to readable ascii'."\n\t";
print '-a convert ascii to "\xXX" hex'."\n\t";
print "\n";
exit;
}
$infile = $opts{f};
$gen_hex = $opts{a};
$gen_ascii = $opts{x};use
if((!opts{f} || (!$gen_hex && !$gen_ascii)) {
usage();
exit;
}
if($infile) {
open(INFILE,$infile) || die "Error Opening '$infile': $!\n";
while(<INFILE>) {
#Strips newlines
s/\n/g;
#Strips tabs
s/\t//g;
#Strips quotes
s/"//g;
$hex .= $_;
}
}
if($gen_ascii) {
# \xXX hex style to ASCII
$hex =~ s/\\x([a-fA-F0-9]{2,2})/chr(hex($1)/eg;
}
elsif ($gen_hex) {
$hex =~ s/([\W|\w)/"\\x" . uc(sprintf("%2.2x",ord($1)))/eg;
}
print "\n$hex\n";
if($infile) {
close(INFILE);
}
gives me the errors
Backslash found where operator expected at 2.txt line 36, near "s/\"
(Might be runaway multi-line // string starting on line 34)
syntax error at 2.txt line 25, near ") {"
syntax error at 2.txt line 28, near "}"
syntax error at 2.txt line 36, near "s/\"
syntax error at 2.txt line 41. nar "}"
Execution of 2.txt aborted due to compilation errors
Do you see the problems?
#Strips newlines
s/\n/g;
Is wrong. You forgot an extra /:
#Strips newlines
s/\n//g;
Also, there are too few parenthesis here:
if((!opts{f} || (!$gen_hex && !$gen_ascii)) {
Rather than add some, you appear to have one extra one. Just take it out.
As a side note, try to use warnings; whenever possible. It's a Good Thing™.
EDIT: While I'm at it, you might want to be careful with your open()s:
open(INPUT,$input);
can be abused. What if $input is ">file.txt"? Then open() will try to open the file for writing - not what you want. Try this instead:
open(INPUT, "<", $input);
There are many errors: trailing use, missing / in s operator, unbalanced brackets in if expression. Little bit tidy up:
use strict;
use Getopt::Std;
my %opts = ();
getopts( "f:xa", \%opts );
my ( $gen_hex, $gen_ascii );
sub usage() {
print <<EOU
$0 -f <file> [-x | -a]
-p <path to input file>
-x convert "\\xXX" hex to readable ascii
-a convert ascii to "\\xXX" hex
EOU
}
#ARGV = ( $opts{f} ) if exists $opts{f};
$gen_hex = $opts{a};
$gen_ascii = $opts{x};
if ( not( $gen_hex xor $gen_ascii ) ) {
usage();
exit;
}
my $transform = $gen_ascii
? sub {
s/\\x([a-fA-F0-9]{2,2})/pack'H2', $1/eg;
}
: sub {
s/([^[:print:]])/'\\x'.uc unpack'H2', $1/eg;
};
while (<>) {
s/\n #Strips newlines
| \t #Strips tabs
| " #Strips quotes
//xg;
&$transform;
print;
}
line25: if((!opts{f} || (!$gen_hex && !$gen_ascii)) {
line26: usage();
It's $opts{f}
Actually, I think the error is here :
s/"//g;
The double quotes should be escaped, so that the line would become :
s/\"//g;
You can notice that this is the line the syntax highlighting goes wrong on SO.