Perl unicode support from console (#ARGV) /Windows/

Perl unicode support from console (#ARGV) /Windows/ - perl

I'm trying to get unicode characters as arguments in perl script:
C:\>perl test.pl ö
#----
# test.pl
#----
#!/usr/bin/perl
use warnings;
use strict;
my ($name, $number) = #ARGV;
if (not defined $name) {
die "Need name\n";
}
if (defined $number) {
print "Save '$name' and '$number'\n";
# save name/number in database
exit;
}
if ($name eq 'ö') {
print "Fetch umlaut 'oe'\n";
} elsif ($name eq 'o') {
print "Fetch simple 'o'\n";
} else {
print "Fetch other '$name'\n";
}
print "ü";
and I get the output:
Fetch simple 'o'
ü
I've tested the code (algorithm) in python 3 and it works, so I get "ö".
But obviously in perl there is something more that I must add or set.
It doesn't matter if it is Strawberry Perl or ActiveState Perl. I get the same result.
Thanks in advance!

#!/usr/bin/perl
use strict;
use warnings;
my $encoding_in;
my $encoding_out;
my $encoding_sys;
BEGIN {
require Win32;
$encoding_in = 'cp' . Win32::GetConsoleCP();
$encoding_out = 'cp' . Win32::GetConsoleOutputCP();
$encoding_sys = 'cp' . Win32::GetACP();
binmode(STDIN, ":encoding($encoding_in)");
binmode(STDOUT, ":encoding($encoding_out)");
binmode(STDERR, ":encoding($encoding_out)");
}
use Encode qw( decode );
{
my ($name, $number) = map { decode($encoding_sys, $_) } #ARGV;
if (not defined $name) {
die "Need name\n";
}
if (defined $number) {
print "Save '$name' and '$number'\n";
# save name/number in database
exit;
}
if ($name eq 'ö') {
print "Fetch umlaut 'oe'\n";
} elsif ($name eq 'o') {
print "Fetch simple 'o'\n";
} else {
print "Fetch other '$name'\n";
}
print "ü";
}
Also, you should add use feature qw( unicode_strings ); and/or encode your file using UTF-8 and add use utf8;.

In addition to ikagami's fine answer, I'm a fan of the Encode::Locale module that automatically creates aliases for the current console's code pages. It works well with Win32, OS X & other flavors of *nix.
#!/usr/bin/perl
use strict;
use warnings;
# These two lines make life better when you leave the world of ASCII
# Just remember to *save* the file as UTF8....
use utf8;
use feature 'unicode_strings';
use Encode::Locale 'decode_argv'; # We'll use the console_in & console_out aliases as well as decode_argv().
use Encode;
binmode(STDIN, ":encoding(console_in)");
binmode(STDOUT, ":encoding(console_out)");
binmode(STDERR, ":encoding(console_out)");
decode_argv( ); # Decode ARGV in place
my ($name, $number) = #ARGV;
if (not defined $name) {
die "Need name\n";
}
if (defined $number) {
print "Save '$name' and '$number'\n";
# save name/number in database
exit;
}
if ($name eq 'ö') {
print "Fetch umlaut 'oe'\n";
} elsif ($name eq 'o') {
print "Fetch simple 'o'\n";
} else {
print "Fetch other '$name'\n";
}
print "ü";
Perhaps it's only syntactic sugar, but it makes easy reading and promotes cross-platform compatibility.

I think that the code answers to this question are well pointed but not complete:
that way , it is very complicated to construct a script with all the code page + source codification in mind, and moreover, it would be harder to make it portable: ö may be known to latin alphabet users, but の or 렌 also exist...
they may run ok with chars in a particular code page, but with chars outside it, they will fail (which is probably the case with some users in the comments). Note that Windows' Code Pages are previous to Unicode.
The fundamental problem is that Perl 5 for Windows is not compiled with Unicode support as Windows understands it: it is just a port of the linux code, and so, almost all Unicode chars are mangled before they even reach the Perl code.
A longer technical explanation (and a C patch!) is provided by A. Sinan Unur's page Fixing Perl's Unicode problems on the command line on Windows: A trilogy in N parts (under Artistic License 2.0).
So (but not for the faint of spirit): a recompilation of perl.exe is possible and almost fully Unicode compliant in Windows. Hopefully they'll be integrated some day in the source code... Until them I've resumed some detailed instructions to patch perl.exe here.
Note also that a proper command console with full Unicode support is needed. A quick solution is to use ConEmu, but Windows' cmd.exe could also work after some heavy tweaks.

I don't know if this is the solution for very scenario, but I could get away by using the parameter "-CAS" when calling my script.
Example:
Script_1:
use strict;
use utf8;
$|++; # Prevent buffering issues
my ($arg) = #ARGV;
save_to_file('test.txt', $arg);
sub save_to_file{
my ($filename, $content) = #_;
open(my $fh, '>:encoding(UTF-8)', $filename) or die "Can't open < $filename: $!";;
print $fh $content;
close $fh;
return;
}
Script_2 calling 1:
use strict;
use utf8;
execute_command();
sub execute_command {
my $command = "perl -CAS simple_utf_string.pl äääöööü";
# Execute command
print "The command to run is: $command\n";
open my $command_pipe, "-|:encoding(UTF-8)", $command or die "Pipe from $command failed: $!";
while (<$command_pipe>) {
print $_;
}
}
Result in: text.txt:
äääöööü

Related

How to check for a string using Perl given

I'm trying to replace a particular line in a file. I can get my program to run, but it doesn't actually do the replacing that I want it to.
Here is my sample file:
test line 1
test line 2
line to be overwritten
test line 3
Here is the code that I have:
my $origFile = $file_path . "junk\.file";
my $newFile = $file_path . "junk\.file\.backup";
# system command to make a backup of the file
system "mv $origFile $newFile";
#opens the files
open( my $INFILE, $newFile ) || die "Unable to read $newFile\n";
open( my $OUTFILE, '>' . $origFile ) || die "Unable to create $origFile\n";
# While loop to read in the file line by line
while ( <$INFILE> ) {
given ($_) {
when ("line to be overwritten") {
print $OUTFILE "line has been overwritten\n";
}
default {
print $OUTFILE $_;
}
}
}
close($INFILE);
close($OUTFILE);
I've tried to change the when statements several different ways to no avail:
when ($_ eq "line to be overwritten")
when ($_ == "line to be overwritten")
when ($_ cmp "line to be overwritten")
But those only generate errors. Anyone know what I'm doing wrong here?

As highlighted in a comment on the original question, given/when is an experimental feature of perl. I would personally recommend using if/else in a loop, and then either use string equality or a regex to match the line(s) you want to replace. A quick example:
use strict;
use warnings;
while(my $line = <DATA>) {
if ( $line =~ /line to be overwritten/ ) {
print "Overwritten\n";
} else {
print $line;
}
}
__DATA__
test line 1
test line 2
line to be overwritten
test line 3
This will give the output:
test line 1
test line 2
Overwritten
test line 3
You could also use the string equality if you aren't confident in your regex, or the string is guaranteed to be the same:
...
if ($line eq 'line to be overwritten') {
...
Sidenotes
open
On your initial open, it is recommended to use the 3 argument version of open to save from unexpected issues:
open(my $INFILE, '<', $newFile) || die "Unable to read $newFile\n";
(for more info on this, see here: http://modernperlbooks.com/mt/2010/04/three-arg-open-migrating-to-modern-perl.html)
strict & warnings
Also, it is recommended to use strict and warnings in your code file, as seen in my example above - this will save you from accidental mistakes like trying to use a variable which has not been declared, and syntax errors which may give you head-scratching results!
Experimental Features
Experimental features in perl are where there is no guarantee made for backwards compatibility to be maintained when a new release of perl comes out. Obviously if you are using the same version of perl everywhere it should be compatible, but things may break if you update to another major version of perl. answered here as I dont have the reputation to answer in the comments...

You seem to be making it way more complicated than it needs to be - a simple regex to check each line and act accordingly should do the job.
while(<$INFILE>)
{
chomp($_);
if /^line to be overwritten$/ )
{
print $OUTFILE "line has been overwritten\n";
}
else
{
print $OUTFILE "$_\n";
}
}

One way to do it is to use Tie::File module. It allows to replace data right in the file. You can make the backup same way you are currently doing, before changing the original file.
use strict;
use warnings;
use Tie::File;
my $file = 'test.txt';
tie my #textFile, 'Tie::File', $file, recsep => "\n" or die $!;
s/line to be overwritten/line has been overwritten/ for #textFile;
untie #textFile;

The diamond operator seems to work only once

I am writing a script in Perl where I have to open the same file twice in my code. This is my outline of the code:
#!/usr/bin/perl
use strict;
use warnings;
my %forward=();
my %reverse=();
while(<>){
chomp;
# store something
}
}
while(<>){ # open the same file again
chomp;
#print something
}
I am using the diamond operator so I am running the script like this
perl script.pl input.txt
But this is not producing any output. If I open the File using filehandle, the script works. What can be possibly wrong here?

Save your #ARGV before exhausting it. Of course, this will only work for actual files specified on the command line, and not with STDIN.
#!/usr/bin/env perl
use strict;
use warnings;
run(#ARGV);
sub run {
my #argv = #_;
first(#argv);
second(#argv);
}
sub first {
local #ARGV = #_;
print "First pass: $_" while <>;
}
sub second {
local #ARGV = #_;
print "Second pass: $_" while <>;
}

You read all there was to be read in the first loop, leaving nothing to read in the second.
If the input aren't huge, you can simply load it into memory.
my #lines = <>;
chomp( #lines );
for (#lines) {
...
}
for (#lines) {
...
}

Debug a Perl script

I already did some research on Perl script debugging but couldn't find what I was looking for.
Let me explain my problem here.
I have a Perl script which is not entering into last while loop it seems cos it is not printing anything inside as instructed.
So, I want to know is there any easier method available to see all lines one by one like we can see in shell script using
set -x
Here is my Perl script code
#!/usr/bin/perl -w
my $ZONEADM = "/usr/sbin/zoneadm list -c";
use strict;
use diagnostics;
use warnings;
system("clear");
print "Enter the app\n";
chomp(my $INS = <>);
print "\nEnter the Symmitrix ID\n";
chomp(my $SYMM = <>);
print "\nEnter the Server\n";
chomp(my $SRV = <>);
print "\nEnter the devices\n";
while (<>) {
if($_ !~ m/(q|quit)/) {
chomp($_);
my $TEMP_FILE = "/export/home/ptiwari/scripts/LOG.11";
open (my $FH, '>>', $TEMP_FILE);
my #arr = split(/:/, $_);
if($arr[3]) {
print $FH "/".$INS."db/".$arr[0]." ".$SYMM." ".$arr[1]." ".$arr[2]." ".$arr[3]."\n";
}
else {
print $FH "/".$INS."db/".$arr[0]." ".$SYMM." ".$arr[1]." ".$arr[2]."\n";
}
undef #arr;
close $FH;
}
else {
exit;
}
}
my $IS_ZONE = qx($ZONEADM|grep -i $SRV|grep -v global);
if($IS_ZONE) {
$IS_ZONE = "yes";
}
else {
$IS_ZONE = "no";
}
open(my $FLH, '<', "/export/home/ptiwari/scripts/LOG.11");
my #lines;
while(<$FLH>) {
my ($GLOBAL_MTPT, $SYM, $SYM_DEV, $SIZE, $NEWFS) = split;
print $GLOBAL_MTPT." ".$SYM." ".$SYM_DEV;
print "\n";
}
I already tried perl -d but it didn't show me anything which can help me to troubleshoot why it didn't enter the while loop.

Your while(<>) loop doesn't have sensible termination conditions. The /q|quit/ regex is buggy.
You exit the whole script if any line contains q or quit. You will also exit, if the device descriptions contains things like quill or acquisition. The effect of typing an accidental q is similar to a CtrlC.
The only way to finish the loop and go on with the script is to send an EOF. This requires the user to punch CtrlD into the keyboard, or a file to simply end. Then your script will continue.
There are some other things wrong/weird with this script.
Main criticism: (a) all-uppercase variables are informally reserved for Perl and pragmatic modules. Lowercase or mixed case variables work too. (b) Your script contains quite some redundant code. Either refactor it into subs, or rewrite your logic
Here is an example rewrite that may be easier to debug / may not contain some of the bugs.
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
use constant DEBUG_FLAG => 1; # set to false value for release
my $zoneadm_command = "/usr/sbin/zoneadm list -c";
my $temp_file_name = "/export/home/ptiwari/scripts/LOG.11";
sub prompt { print "\n", $_[0], "\n"; my $answer = <>; chomp $answer; return $answer }
sub DEBUG { print STDERR "DEBUG> ", #_, "\n" if DEBUG_FLAG }
system("clear");
my $app_name = prompt("Enter the app");
my $symm_id = prompt("Enter the Symmitrix ID");
my $server = prompt("Enter the server name");
print "Enter the devices.\n";
print qq(\tTo terminate the script, type "q" or "quit".\n);
print qq(\tTo finish the list of devices, type Ctrl+D.\n);
open my $temp_file, ">>", $temp_file_name
or die "Can't open log file: $!";
while (<>) {
chomp; # remove trailing newline
exit if /^q(?:uit)?$/; # terminate the script if the input line *is* `q` or `quit`.
my #field = split /:/;
# grep: select all true values
#field = grep {$_} ("/${app_name}db/$field[0]", $symm_id, #field[1 .. 3]);
print $temp_file join(" ", #field), "\n";
}
close $temp_file;
DEBUG("finished the reading loop");
# get the zones with only *one* extra process
my #zones =
grep {not /global/}
grep {/\Q$server\E/i}
map {chomp; $_}
qx($zoneadm_command);
my $is_zone = #zones ? "yes" : "no";
DEBUG("Am I in the zone? $is_zone");
open my $device_file, "<", $temp_file_name or die "Can't open $temp_file_name: $!";
while (<$device_file>) {
chomp;
my ($global_mtpt, $sym, $sym_dev) = split;
print join(" ", $global_mtpt, $sym, $sym_dev), "\n";
# or short: print join(" ", (split)[0 .. 2]), "\n";
}

You need something like this for stepping into the script:
http://www.devshed.com/c/a/Perl/Using-The-Perl-Debugger/

You can really use the debugger: http://perldoc.perl.org/perldebug.html
But if your preference is to trace like bash -x, take a look at this discussion:
http://www.perlmonks.org/?node_id=419653

The Devel::Trace Perl module is designed to mimic sh -x tracing for shell programs.

Try to remove the "my $" from the last open statement and the "$" from there in the last while statement. Or better yet, try this:
open(my FLH, '<', "/export/home/ptiwari/scripts/LOG.11");
my #lines = <FLH>;
foreach (#lines) {
my ($GLOBAL_MTPT, $SYM, $SYM_DEV, $SIZE, $NEWFS) = split;
print $GLOBAL_MTPT." ".$SYM." ".$SYM_DEV;
print "\n";
}
Let me know about the results.

Perl Frequency sorting and other things

Edit:
So I got the script working great with all your help, so thanks a lot.
I'm also trying to figure out how I can keep the standard input choice but still be able to use a command-line "start" aswell,
I want both to be able to start it by for ex. "perl wfreq.pl" and it then asks the user what file, but I also want to be able to start it by saying "perl wfreq.pl example.txt" and then it shouldnt ask for the user input.
Is this possible?
#! /usr/bin/perl
use utf8;
use warnings;
print "Please enter the name of the file: \n" ;
$file = <STDIN>;
chop $file;
open(my $DATA, "<:utf8", $file) or die "Oops!!: $!";
binmode STDOUT, ":utf8";
while(<$DATA>) {
tr/A-Za-z//cs;
s/[;:()".,!?]/ /gio;
foreach $word (split(' ', lc $_)) {
$freq{$word}++;
}
}
foreach $word (sort { $freq{$b} <=> $freq{$a} } keys %freq) {
#fr = (#fr, $freq{$word});
#ord = (#ord, $word);
}
for ($v =0; $v < 10; $v++){
print " $fr[$v] | $ord[$v]\n";
}

As MVS wrote, you need to use the "spaceship" <=> operator and sort not keys, but values.
Here's example that should work:
Change
foreach $word (sort keys %freq) {
print "$word $freq{$word}\n";
}
To
foreach $word (sort { $freq{$a} <=> $freq{$b} } keys %freq) {
print "$word $freq{$word}\n";
}
perldoc -f sort contains just the same example at lines 23-27 of code snippet.
Talking about encoding: utf8 pragma documentation says to use it for
enable/disable UTF-8 (or UTF-EBCDIC) in source code
To enable UTF-8 in file input you need to open file in specific mode (using specific layer) and apply 'utf8' layer to STDOUT:
open(my $DATA, "<:utf8", $file) or die "Oops!!: $!";
binmode STDOUT, ":utf8";
For more information about :utf8 and other IO Layers you can read
:utf8 layer description
perldoc -f open
perldoc -f binmode

sort keys will sort by the keys of the hash, which are the words themselves. Instead, you'll want to sort by the values instead. Have a look at the documentation to help you (in particular, the parts about the "spaceship operator" <=>).
To put the numbers before the words, just switch $word and $freq{$word} in the print statement.
As for returning only the top 10 results, try a counter inside your foreach loop (and the break statement).
For UTF-8 characters, your use utf8 should be good enough to enable them, but if you're encountering problems, this might help.

You can use bunch of neat things in Perl:
perl -CS -F\\PL+ -alne'$f{lc$_}++for#F}{print"$f{$_} $_"for sort{$f{$b}<=>$f{$a}}keys%f'
See perlrun(1) for more details.

Perl - Command line input and STDIN

I've created this script below for a assignment I have. It asks for a text file, checks the frequency of words, and lists the 10 words that appear the most times. Everything is working fine, but I need this script to be able to start via the command line as well as via the standard input.
So I need to be able to write 'perl wfreq.pl example.txt' and that should start the script and not ask the question for a text file. I'm not sure how to accomplish this really. I think I might need a while loop at the start somewhere that skips the STDIN if you give it the text file on a terminal command line.
How can I do it?
The script
#! /usr/bin/perl
use utf8;
use warnings;
print "Please enter the name of the file: \n" ;
$file = <STDIN>;
chop $file;
open(my $DATA, "<:utf8", $file) or die "Oops!!: $!";
binmode STDOUT, ":utf8";
while(<$DATA>) {
tr/A-Za-z//cs;
s/[;:()".,!?]/ /gio;
foreach $word (split(' ', lc $_)) {
$freq{$word}++;
}
}
foreach $word (sort { $freq{$b} <=> $freq{$a} } keys %freq) {
#fr = (#fr, $freq{$word});
#ord = (#ord, $word);
}
for ($v =0; $v < 10; $v++) {
print " $fr[$v] | $ord[$v]\n";
}

Instead of reading from <STDIN>, you can read from <> to get data either from files provided on the command line or from stdin if there are no files.
For example, with the program:
#!/usr/bin/env perl
while (<>) {
print $_;
}
The command ./script foo.txt will read and print lines from foo.txt, while ./script by itself will read and print lines from standard input.

You need to do the following:
my $DATA;
my $filename = $ARGV[0];
unless ($filename) {
print "Enter filename:\n";
$filename = <STDIN>;
chomp $filename;
}
open($DATA, $filename) or die $!;
Though I have to say, user-prompts are very un-Unix like.

perl script.pl < input.txt
The use of the operator < passes input.txt to script.pl as standard input. You can then skip querying for the filename. Otherwise, use $ARGV[0] or similar, if defined.

You can check for a command-line argument in #ARGV, which is Perl's array that automagically grabs command line arguments, and --if present-- process them (else continue with input from STDIN). Something like:
use utf8;
use strict; #Don't ever forget this! Always, always, ALWAYS use strict!
use warnings;
if(#ARGV)
{
#Assume that the first command line argument is a file to be processed as in your original code.
#You may or may not want to care if additional command line arguments are passed. Up to you.
}
else
{
#Ask for the filename and proceed as normal.
}
Note that you might want to isolate the code for processing the file name (i.e., the opening of DATA, going through the lines, counting the words, etc.) to a subroutine that you can easily call (perhaps with an argument consisting of the file name) in each branch of the code.
Oh, and to make sure I'm clear about this: always use strict and always use warnings. If you don't, you're asking for trouble.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl unicode support from console (#ARGV) /Windows/ - perl

Related

How to check for a string using Perl given

The diamond operator seems to work only once

Debug a Perl script

Perl Frequency sorting and other things

Perl - Command line input and STDIN

Categories

Resources