diff behaviour between perl 5.8.8 and 5.10.1 - perl

I have this script:
#!/usr/bin/perl
$LOGFILE = "Soccer.txt";
open(LOGFILE) or die("Could not open log file.");
foreach $line (<LOGFILE>) {
chomp $line;
($hour, $match, $idh, $back1, $lay1, $idd, $backx, $layx, $ida, $back2, $lay2) = split(/\s*,\s*/,$line);
$match =~ s/^(\d{2}):(\d{2}) //g; #/ # fix highlighting
($hteam,$ateam) = split(/\s*ยง\s*/,$match,2);
$hteam = get_name($hteam);
$ateam = get_name($ateam);
$match = "$hteam - $ateam";
$foo=qq($hour "$match" $idh $back1 $lay1 $idd $backx $layx $ida $back2 $lay2 \n) ;
print $foo;
}
sub get_name {
# Return the full name for the team, if it exists, otherwise return the original
my %alias = (
"Aalen" => "Vfr Aalen",
"Accrington" => "Accrington Stanley",
"Accrington St" => "Accrington Stanley",
"Adelaide Utd" => "Adelaide United Fc"
);
return $alias{$_[0]} // $_[0];
}
This scrpit works perfect in my system:
perl, v5.10.1 (*) built for i686-linux-gnu-thread-multi
Now I would like run it in a different system:
perl, v5.8.8 built for x86_64-linux-thread-multi
When I try to run it, I get this error:
Search pattern not terminated at ./scriptbd.robot.pl line 458.
Why am I getting an error?

return $alias{$_[0]} // $_[0];
The // operator was added in 5.10. To make it work on earlier Perls, rewrite it:
return (defined $alias{$_[0]}) ? $alias{$_[0]} : $_[0];

Related

How to distinguish between "0" and NULL in perl?

Here we are looking for the string "reftext" in the given file. The line next to this contains a string with 3 integers. So we are extracting them in #all_num. We are printing the value of #all_num[2] only if is not NULL. But the logic used here doesn't print #all_num[2] even if it has 0.
#!/usr/bin/perl
open( READFILE, "<myfile.txt" );
#list = <READFILE>;
$total_lines = scalar #list;
for ( $count = 0; $count < $total_lines; $count++ ) {
if (#list[ $count =~ /reftext/ )
{
#all_num = #list[ $count + 1 ] =~ /(\d+)/g;
if ( #all_num[2] != NULL ) {
print "#all_num[2]\n";
}
}
}
Hope this helps,
use strict;
use warnings;
my #fvals = (
[ i => undef ],
[ j => 0 ],
[ k => "" ],
);
for my $r (#fvals) {
my ($k, $v) = #$r;
if (!defined($v)) { print "$k is undef\n"; }
elsif (!length($v)) { print "$k is empty string\n"; }
# elsif (!$v) { print "$k is zero\n"; }
# recognizes zero value in "0.0" or "0E0" notation
elsif ($v == 0) { print "$k is zero\n"; }
}
output
i is undef
j is zero
k is empty string
Perl does not include a NULL, so the line
if(#all_num[2]!= NULL)
is nonsensical in Perl. (More accurately, it attempts to locate a sub named NULL and run it to get the value to compare against #all_num[2], but fails to do so because you (presumably) haven't defined such a sub.) Note that, if you had enabled use strict, this would cause a fatal error instead of pretending to work. This is one of the many reasons to always use strict.
Side note: When you pull a value out of an array, it's only a single value, so you should say $all_num[2] rather than #all_num[2] when referring to the third element of the array #all_num. (Yes, this is a little confusing to get used to. I hear that it's been changed in Perl 6, but I'm assuming you're using Perl 5 here.) Note that, if you had enabled use warnings, it would have told you that "Scalar value #all_num[2] better written as $all_num[2]". This is one of the many reasons to always use warnings.
If you want to test whether $all_num[2] contains a value, the proper way to express that in Perl is
if (defined $all_num[2])
This is how your program would look using best practices
You should
Always use strict and use warnings, and declare all your variables with my
Use the three-parameter form of open
Check that open calls succeeded, and include $! in the die string if not
Use a while loop to process a file one line at a time, in preference to reading the entire file into memory
#!/usr/bin/perl
use strict;
use warnings;
open my $fh, '<', 'myfile.txt' or die $!;
while ( <$fh> ) {
next unless /reftext/;
my $next_line = <$fh>;
my #all_num = $next_line =~ /\d+/g;
print "$all_num[2]\n" if defined $all_num[2];
}
Try this:
#!/usr/bin/perl
use warnings;
use strict;
open(READFILE, "<", "myfile.txt") or die $!;
my #list = <READFILE>;
my $total_lines = scalar #list;
close (READFILE);
for(my $count=0; $count<$total_lines; $count++)
{
if($list[$count] =~ /reftext/)
{
my #all_num = $list[$count+1] =~ /(\d+)/g;
if($all_num[2] ne '')
{
print "$all_num[2]\n";
}
}
}
To check a variable is null or not:
if ($str ne '')
{
print $str;
}
or better:
my ($str);
$str = "";
if (defined($str))
{
print "defined";
}
else
{
print "not defined";
}
If the other answers do not work, try treating the variable as a string:
if ( $all_num[2] == 'null' && length($all_num[2]) == 4 ){
# null
} else {
# not null
}
As with any code you write, be sure to test your code.

Perl word Stemming English text

I am trying to stem an English text, I read a lot of forums but I couldn't see a clear example. I am using porter stemmer as in using Text::ENglish.
This is how far I got:
use Lingua::StopWords qw(getStopWords);
my $stopwords = getStopWords('en');
use Text::English;
#stopwords = grep { $stopwords->{$_} } (keys %$stopwords);
chdir("c:/Test Facility/input");
#files = <*>;
foreach $file (#files)
{
open (input, $file);
while (<input>)
{
open (output,">>c:/Test Facility/normalized/".$file);
chomp;
for my $w (#stopwords)
{
s/\b\Q$w\E\b//ig;
}
$_ =~s/<[^>]*>//g;
$_ =~ s/[[:punct:]]//g;
##What should I write here to apply porter stemming using Text::English##
print output "$_\n";
}
}
close (input);
close (output);
Run the following code like this:
perl stemmer.pl /usr/lib/jvm/java-6-sun-1.6.0.26/jre/LICENSE
It produces output similar to:
operat system distributor licens java version sun microsystems inc sun willing to license java platform standard edition developer kit jdk
Note that strings with length 1 and numeric values are removed, besides stopwords.
#!/usr/bin/env perl
use common::sense;
use Encode;
use Lingua::Stem::Snowball;
use Lingua::StopWords qw(getStopWords);
use Scalar::Util qw(looks_like_number);
my $stemmer = Lingua::Stem::Snowball->new(
encoding => 'UTF-8',
lang => 'en',
);
my %stopwords = map {
lc
} keys %{getStopWords(en => 'UTF-8')};
local $, = ' ';
say map {
sub {
my #w =
map {
encode_utf8 $_
} grep {
length >= 2
and not looks_like_number($_)
and not exists $stopwords{lc($_)}
} split
/[\W_]+/x,
shift;
$stemmer->stem_in_place(\#w);
map {
lc decode_utf8 $_
} #w
}->($_);
} <>;

How can I still get automatic assignment to '$_' with a mocked 'readline' function?

Perl has some special handling for the readline function (and the equivalent <> I/O operator) where it treats the expressions
while (<HANDLE>)
while (readline(HANDLE))
as equivalent to
while (defined($_ = <HANDLE>))
cf.
$ perl -MO=Deparse -e 'f($_) while <>'
f($_) while defined($_ = <ARGV>); <--- implicitly sets $_
-e syntax OK
But this automatic assignment doesn't seem to happen if you hijack the readline function:
$ perl -MO=Deparse -e 'BEGIN {
> *CORE::GLOBAL::readline = sub { }
> }
> f($_) while <>'
sub BEGIN {
*CORE::GLOBAL::readline = sub {
};
}
f($_) while readline(ARGV); <--- doesn't set $_ !
-e syntax OK
Of course, this will make the custom readline function work incorrectly for a lot of legacy code. The output of this code is "foo" with the BEGIN block and "bar" without it, but I want it to be "BAR".
use warnings;
BEGIN { *CORE::GLOBAL::readline = \&uc_readline; }
sub uc_readline {
my $line = CORE::readline(shift || *ARGV);
return uc $line if defined $line;
return;
}
($_, $bar) = ("foo\n", "bar\n");
open X, '<', \$bar;
while (<X>) {
print $_; # want and expect to see "BAR\n"
}
What options do I have to hijack the readline function but still get the proper treatment of the while (<...>) idiom? It's not practical to explicitly convert everything to while (defined($_=<...>)) in all the legacy code.
This is a fairly dirty hack using overloading to detect boolean context, but it seems to do the trick. It certainly needs more testing than I have given it before using this solution in a production environment:
use warnings;
BEGIN { *CORE::GLOBAL::readline = \&uc_readline; }
sub uc_readline {
my $line = CORE::readline(shift || *ARGV);
return Readline->new(uc $line) if defined $line;
return;
}
{package Readline;
sub new {shift; bless [#_]}
use overload fallback => 1,
'bool' => sub {defined($_ = $_[0][0])}, # set $_ in bool context
'""' => sub {$_[0][0]},
'+0' => sub {$_[0][0]};
}
my $bar;
($_, $bar) = ("foo\n", "bar\n");
open X, '<', \$bar;
while (<X>) {
print $_; # want and expect to see "BAR\n"
}
which prints:
BAR
This will also make if (<X>) {...} set $_. I don't know if there is a way to limit the magic to only while loops.
This code:
use warnings;
BEGIN { *CORE::GLOBAL::readline = \&uc_readline; }
sub uc_readline {
my $line = CORE::readline(shift || *ARGV);
return unless defined $line;
$line = uc $line;
$_ = $line;
return $line;
}
($_, $bar) = ("foo\n", "bar\n");
open X, '<', \$bar;
while (<X>) {
print $_; # want and expect to see "BAR\n"
}
print "$_"; # prints "BAR" instad of "foo"
does almost the right thing, but $_ is not localised, so after the loop, $_ is set to the last value read from the filehandle. Adding Scope::Upper to the mix fixes that:
use warnings;
use Scope::Upper qw/localize SCOPE/;
BEGIN { *CORE::GLOBAL::readline = \&uc_readline; }
sub uc_readline {
my $line = CORE::readline(shift || *ARGV);
return unless defined $line;
$line = uc $line;
local $_ = $line;
# localize $_ in the scope of the while
localize *main::_, \$line, SCOPE(1);
return $line;
}
($_, $bar) = ("foo\n", "bar\n");
open X, '<', \$bar;
while (<X>) {
print "$_"; # want and expect to see "BAR\n"
}
print "$_"; # will print 'foo', not "BAR"

Where can I find an array of the (un)assigned Unicode code points for a particular block?

At the moment, I'm writing these arrays by hand.
For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this:
my %symbols = (
...
miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC,
(0x27D0..0x27EF)],
...
)
The simpler, 'continuous' array
miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF]
doesn't work because Unicode blocks have holes in them. For example, there's nothing at 0x27CB. Take a look at the code chart [PDF].
Writing these arrays by hand is tedious, error-prone and a bit fun. And I get the feeling that someone has already tackled this in Perl!
Perhaps you want Unicode::UCD? Use its charblock routine to get the range of any named block. If you want to get those names, you can use charblocks.
This module is really just an interface to the Unicode databases that come with Perl already, so if you have to do something fancier, you can look at the lib/5.x.y/unicore/UnicodeData.txt or the various other files in that same directory to get what you need.
Here's what I came up with to create your %symbols. I go through all the blocks (although in this sample I skip that ones without "Math" in their name. I get the starting and ending code points and check which ones are assigned. From that, I create a custom property that I can use to check if a character is in the range and assigned.
use strict;
use warnings;
digest_blocks();
my $property = 'My::InMiscellaneousMathematicalSymbolsA';
foreach ( 0x27BA..0x27F3 )
{
my $in = chr =~ m/\p{$property}/;
printf "%X is %sin $property\n",
$_, $in ? '' : ' not ';
}
sub digest_blocks {
use Unicode::UCD qw(charblocks);
my $blocks = charblocks();
foreach my $block ( keys %$blocks )
{
next unless $block =~ /Math/; # just to make the output small
my( $start, $stop ) = #{ $blocks->{$block}[0] };
$blocks->{$block} = {
assigned => [ grep { chr =~ /\A\p{Assigned}\z/ } $start .. $stop ],
unassigned => [ grep { chr !~ /\A\p{Assigned}\z/ } $start .. $stop ],
start => $start,
stop => $stop,
name => $block,
};
define_my_property( $blocks->{$block} );
}
}
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = join "\n", # can do ranges here too
map { sprintf "%X", $_ }
#{ $block->{assigned} };
*{"My::In$subname"} = sub { $string };
}
If I were going to do this a lot, I'd use the same thing to create a Perl source file that has the custom properties already defined so I can just use them right away in any of my work. None of the data should change until you update your Unicode data.
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = num2range( #{ $block->{assigned} } );
print <<"HERE";
sub My::In$subname {
return <<'CODEPOINTS';
$string
CODEPOINTS
}
HERE
}
# http://www.perlmonks.org/?node_id=87538
sub num2range {
local $_ = join ',' => sort { $a <=> $b } #_;
s/(?<!\d)(\d+)(?:,((??{$++1})))+(?!\d)/$1\t$+/g;
s/(\d+)/ sprintf "%X", $1/eg;
s/,/\n/g;
return $_;
}
That gives me output suitable for a Perl library:
sub My::InMiscellaneousMathematicalSymbolsA {
return <<'CODEPOINTS';
27C0 27CA
27CC
27D0 27EF
CODEPOINTS
}
sub My::InSupplementalMathematicalOperators {
return <<'CODEPOINTS';
2A00 2AFF
CODEPOINTS
}
sub My::InMathematicalAlphanumericSymbols {
return <<'CODEPOINTS';
1D400 1D454
1D456 1D49C
1D49E 1D49F
1D4A2
1D4A5 1D4A6
1D4A9 1D4AC
1D4AE 1D4B9
1D4BB
1D4BD 1D4C3
1D4C5 1D505
1D507 1D50A
1D50D 1D514
1D516 1D51C
1D51E 1D539
1D53B 1D53E
1D540 1D544
1D546
1D54A 1D550
1D552 1D6A5
1D6A8 1D7CB
1D7CE 1D7FF
CODEPOINTS
}
sub My::InMiscellaneousMathematicalSymbolsB {
return <<'CODEPOINTS';
2980 29FF
CODEPOINTS
}
sub My::InMathematicalOperators {
return <<'CODEPOINTS';
2200 22FF
CODEPOINTS
}
Maybe this?
my #list =
grep {chr ($_) =~ /^\p{Assigned}$/}
0x27C0..0x27EF;
#list = map { $_ = sprintf ("%X", $_ )} #list;
print "#list\n";
Gives me
27C0 27C1 27C2 27C3 27C4 27C5 27C6 27C7 27C8 27C9 27CA 27D0 27D1 27D2 27D3
27D4 27D5 27D6 27D7 27D8 27D9 27DA 27DB 27DC 27DD 27DE 27DF 27E0 27E1 27E2
27E3 27E4 27E5 27E6 27E7 27E8 27E9 27EA 27EB
I don't know why you wouldn't say miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF], because that's how the Unicode standard is defined according to the PDF.
What do you mean when you say it doesn't "work"? If it's giving you some sort of error when you check the existence of the character in the block, then why not just weed them out of the block when your checker comes across an error?

Access to Perl's empty angle "<>" operator from an actual filehandle?

I like to use the nifty perl feature where reading from the empty angle operator <> magically gives your program UNIX filter semantics, but I'd like to be able to access this feature through an actual filehandle (or IO::Handle object, or similar), so that I can do things like pass it into subroutines and such. Is there any way to do this?
This question is particularly hard to google, because searching for "angle operator" and "filehandle" just tells me how to read from filehandles using the angle operator.
From perldoc perlvar:
ARGV
The special filehandle that iterates over command-line filenames in #ARGV. Usually written as the null filehandle in the angle operator <>. Note that currently ARGV only has its magical effect within the <> operator; elsewhere it is just a plain filehandle corresponding to the last file opened by <>. In particular, passing \*ARGV as a parameter to a function that expects a filehandle may not cause your function to automatically read the contents of all the files in #ARGV.
I believe that answers all aspects of your question in that "Hate to say it but it won't do what you want" kind of way. What you could do is make functions that take a list of filenames to open, and do this:
sub takes_filenames (#) {
local #ARGV = #_;
// do stuff with <>
}
But that's probably the best you'll be able to manage.
Expanding on Chris Lutz's idea, here is a very rudimentary implementation:
#!/usr/bin/perl
package My::ARGV::Reader;
use strict; use warnings;
use autodie;
use IO::Handle;
use overload
'<>' => \&reader,
'""' => \&argv,
'0+' => \&input_line_number,
;
sub new {
my $class = shift;
my $self = {
names => [ #_ ],
handles => [],
current_file => 0,
};
bless $self => $class;
}
sub reader {
my $self = shift;
return scalar <STDIN> unless #{ $self->{names}};
my $line;
while ( 1 ) {
my $current = $self->{current_file};
return if $current >= #{ $self->{names} };
my $fh = $self->{handles}->[$current];
unless ( $fh ) {
$self->{handles}->[$current] = $fh = $self->open_file;
}
if( eof $fh ) {
close $fh;
$self->{current_file} = $current + 1;
next;
}
$line = <$fh>;
last;
}
return $line;
}
sub open_file {
my $self = shift;
my $name = $self->{names}->[ $self->{current_file} ];
open my $fh, '<', $name;
return $fh;
}
sub argv {
my $self = shift;
my $name = #{$self->{names}}
? $self->{names}->[ $self->{current_file} ]
: '-'
;
return $name;
}
sub input_line_number {
my $self = shift;
my $fh = #{$self->{names}}
? $self->{handles}->[$self->{current_file}]
: \*STDIN
;
return $fh->input_line_number;
}
which can be used as:
package main;
use strict; use warnings;
my $it = My::ARGV::Reader->new(#ARGV);
echo($it);
sub echo {
my ($it) = #_;
printf "[%s:%d]:%s", $it, +$it, $_ while <$it>;
}
Output:
[file1:1]:bye bye
[file1:2]:hello
[file1:3]:thank you
[file1:4]:no translation
[file1:5]:
[file2:1]:chao
[file2:2]:hola
[file2:3]:gracias
[file2:4]:
It looks like this has already been implemented as Iterator::Diamond. Iterator::Diamond also disables the 2-argument-open magic that perl uses when reading <ARGV>. Even better, it supports reading '-' as STDIN, without enabling all the other magic. In fact, I might use it for that purpose just on single files.