Where is $_ being modified in this perl code? - perl

The following perl code generates a warning in PerlCritic (by Activestate):
sub natural_sort {
my #sorted;
#sorted = grep {s/(^|\D)0+(\d)/$1$2/g,1} sort grep {s/(\d+)/sprintf"%06.6d",$1/ge,1} #_;
}
The warning generated is:
Don't modify $_ in list functions
More info about that warning here
I don't understand the warning because I don't think I'm modifying $_, although I suppose I must be.
Can someone explain it to me please?

Both of your greps are modifying $_ because you're using s//. For example, this:
grep {s/(^|\D)0+(\d)/$1$2/g,1}
is the same as this:
grep { $_ =~ s/(^|\D)0+(\d)/$1$2/g; 1 }
I think you'd be better off using map as you are not filtering anything with your greps, you're just using grep as an iterator:
sub natural_sort {
my $t;
return map { ($t = $_) =~ s/(^|\D)0+(\d)/$1$2/g; $t }
sort
map { ($t = $_) =~ s/(\d+)/sprintf"%06.6d",$1/ge; $t }
#_;
}
That should do the same thing and keep critic quiet. You might want to have a look at List::MoreUtils if you want some nicer list operators than plain map.

You are doing a substitution ( i.e. s/// ) in the grep, which modifies $_ i.e. the list being grepped.

This and other cases are explained in perldoc perlvar:
Here are the places where Perl will
assume $_ even if you don't use it:
The following functions:
abs, alarm, chomp, chop, chr, chroot,
cos, defined, eval, exp, glob, hex,
int, lc, lcfirst, length, log, lstat,
mkdir, oct, ord, pos, print,
quotemeta, readlink, readpipe, ref,
require, reverse (in scalar context
only), rmdir, sin, split (on its
second argument), sqrt, stat, study,
uc, ucfirst, unlink, unpack.
All file tests (-f , -d ) except for -t , which defaults to STDIN.
See -X
The pattern matching operations m//, s/// and tr/// (aka y///) when
used without an =~ operator.
The default iterator variable in a foreach loop if no other variable is
supplied.
The implicit iterator variable in the grep() and map() functions.
The implicit variable of given().
The default place to put an input record when a operation's result
is tested by itself as the sole
criterion of a while test. Outside a
while test, this will not happen.

Many people have correctly answered that the s operator is modifying $_, however in the soon to be released Perl 5.14.0 there will be the new r flag for the s operator (i.e. s///r) which rather than modify in-place will return the modified elements. Read more at The Effective Perler . You can use perlbrew to install this new version.
Edit: Perl 5.14 is now available! Announcement Announcement Delta
Here is the function suggested by mu (using map) but using this functionality:
use 5.14.0;
sub natural_sort {
return map { s/(^|\D)0+(\d)/$1$2/gr }
sort
map { s/(\d+)/sprintf"%06.6d",$1/gre }
#_;
}

The VERY important part that other answers have missed is that the line
grep {s/(\d+)/sprintf"%06.6d",$1/ge,1} #_;
Is actually modifying the arguments passed into the function, and not copies of them.
grep is a filtering command, and the value in $_ inside the code block is an alias to one of the values in #_. #_ in turn contains aliases to the arguments passed to the function, so when the s/// operator performs its substitution, the change is being made to the original argument. This is shown in the following example:
sub test {grep {s/a/b/g; 1} #_}
my #array = qw(cat bat sat);
my #new = test #array;
say "#new"; # prints "cbt bbt sbt" as it should
say "#array"; # prints "cbt bbt sbt" as well, which is probably an error
The behavior you are looking for (apply a function that modifies $_ to a copy of a list) has been encapsulated as the apply function in a number of modules. My module List::Gen contains such an implementation. apply is also fairly simple to write yourself:
sub apply (&#) {
my ($sub, #ret) = #_;
$sub->() for #ret;
wantarray ? #ret : pop #ret
}
With that, your code could be rewritten as:
sub natural_sort {
apply {s/(^|\D)0+(\d)/$1$2/g} sort apply {s/(\d+)/sprintf"%06.6d",$1/ge} #_
}
If your goal with the repeated substitutions is to perform a sort of the original data with a transient modification applied, you should look into a Perl idiom known as the Schwartzian transform which is a more efficient way of achieving that goal.

Related

Does perl cache regex generation?

Suppose I have a function that dynamically generates regular expressions and then matches against them.
For example, in the following function match_here a \G anchor is inserted at the beginning of the regex. This simplifies the API because the caller does not need to remember to include the pos anchor in the pattern.
#!/usr/bin/env perl
use strict;
use warnings;
use Carp;
use Data::Dumper;
sub match_here {
my ($str, $index, $rg) = #_;
pos($str) = $index;
croak "index ($index) out of bounds" unless pos($str) == $index;
my $out;
if ($str =~ /\G$rg/) {
$out = $+[0];
}
return $out;
}
# no match starting at position 0
# prints '$VAR1 = undef;'
print Dumper(match_here("abc", 0, "b+"));
# match from 1 to 2
# prints '$VAR1 = 2;'
print Dumper(match_here("abc", 1, "b+"));
I'm wondering whether an anonymous regex object is "compiled" every time the function is evaluated or if there's some caching so that identical strings will not cause additional regex objects to be compiled.
Also, assuming that no caching is done by the Perl interpreter, is compiling a regex object expensive enough to be worth caching (possibly in an XS extension)?
From perlop(1), under the m// operator:
PATTERN may contain variables, which will be interpolated every time the pattern search is evaluated
[...]
Perl will not recompile the pattern unless an interpolated variable that it contains changes. You can force Perl to skip the test and never recompile by adding a "/o" (which stands for "once") after the trailing delimiter. Once upon a time, Perl would recompile regular expressions unnecessarily, and this modifier was useful to tell it not to do so, in the interests of speed.
So yes, there is a cache, and you can even force the use of the cache even when it's invalid by saying /o, but you really shouldn't do that.
But that cache only stores one compiled regexp per instance of the m// or s/// operator, so it only helps if the regexp is used with the same variables (e.g. your $rg) many times consecutively. If you alternate between calling it with $rg='b+' and $rg='c+' you will get a recompile every time.
For that kind of situation, you can do your own caching with the qr// operator. It explicitly compiles the regexp and returns an object that you can store and use to execute the regexp later. That could be incorporated into your match_here like this:
use feature 'state';
sub match_here {
my ($str, $index, $rg) = #_;
pos($str) = $index;
croak "index ($index) out of bounds" unless pos($str) == $index;
my $out;
state %rg_cache;
my $crg = $rg_cache{$rg} ||= qr/\G$rg/;
if ($str =~ /$crg/) {
$out = $+[0];
}
return $out;
}
To add more detail on the basic cache (when not using qr//): the fact that $rg is a newly allocated lexical variable each time makes no difference. It only matters that the value is the same as the previous one.
Here's an example to prove the point:
use re qw(Debug COMPILE);
while(<>) {
chomp;
# Insane interpolation. Do not use anything remotely like this in real code
print "MATCHED: $_\n" if /^${\(`cat refile`)}/;
}
Every time the match operator executes, it reads refile. The regular expression is ^ followed by the contents of refile. The debugging output shows that it is recompiled only if the contents of the file have changed. If the file still has the same contents as the last time, the operator notices that the same string is being passed to the regexp compiler again, and reuses the cached result.
Or try this less dramatic example:
use re qw(Debug COMPILE);
#patterns = (
'\d{3}',
'\d{3}',
'[aeiou]',
'[aeiou]',
'\d{3}',
'\d{3}'
);
for ('xyz', '123', 'other') {
for $i (0..$#patterns) {
if(/$patterns[$i]/) {
print "$_ matches $patterns[$i]\n";
} else {
print "$_ does not match $patterns[$i]\n";
}
}
}
in which there are 18 compilations and 11 of them are cache hits, even though the same "variable" (the same element of the #patterns array) is never used twice in a row.

Perl - How to create commands that users can input in console?

I'm just starting in Perl and I'm quite enjoying it. I'm writing some basic functions, but what I really want to be able to do is to use those functions intelligently using console commands. For example, say I have a function adding two numbers. I'd want to be able to type in console "add 2, 4" and read the first word, then pass the two numbers as parameters in an "add" function. Essentially, I'm asking for help in creating some basic scripting using Perl ^^'.
I have some vague ideas about how I might do this in VB, but Perl, I have no idea where I'd start, or what functions would be useful to me. Is there something like VB.net's "Split" function where you can break down the contents of a scalar into an array? Is there a simple way to analyse one word at a time in a scalar, or iterate through a scalar until you hit a separator, for example?
I hope you can help, any suggestions are appreciated! Bear in mind, I'm no expert, I started Perl all of a few weeks ago, and I've only been doing VB.net half a year.
Thank you!
Edit: If you're not sure what to suggest and you know any simple/intuitive resources that might be of help, that would also be appreciated.
Its rather easy to make a script which dispatches to a command by name. Here is a simple example:
#!/usr/bin/env perl
use strict;
use warnings;
# take the command name off the #ARGV stack
my $command_name = shift;
# get a reference to the subroutine by name
my $command = __PACKAGE__->can($command_name) || die "Unknown command: $command_name\n";
# execute the command, using the rest of #ARGV as arguments
# and print the return with a trailing newline
print $command->(#ARGV);
print "\n";
sub add {
my ($x, $y) = #_;
return $x + $y;
}
sub subtract {
my ($x, $y) = #_;
return $x - $y;
}
This script (say its named myscript.pl) can be called like
$ ./myscript.pl add 2 3
or
$ ./myscript.pl subtract 2 3
Once you have played with that for a while, you might want to take it further and use a framework for this kind of thing. There are several available, like App::Cmd or you can take the logic shown above and modularize as you see fit.
You want to parse command line arguments. A space serves as the delimiter, so just do a ./add.pl 2 3 Something like this:
$num1=$ARGV[0];
$num2=$ARGV[1];
print $num1 + $num2;
will print 5
Here is a short implementation of a simple scripting language.
Each statement is exactly one line long, and has the following structure:
Statement = [<Var> =] <Command> [<Arg> ...]
# This is a regular grammar, so we don't need a complicated parser.
Tokens are seperated by whitespace. A command may take any number of arguments. These can either be the contents of variables $var, a string "foo", or a number (int or float).
As these are Perl scalars, there is no visible difference between strings and numbers.
Here is the preamble of the script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
strict and warnings are essential when learning Perl, else too much weird stuff would be possible. The use 5.010 is a minimum version, it also defines the say builtin (like a print but appends a newline).
Now we declare two global variables: The %env hash (table or dict) associates variable names with their values. %functions holds our builtin functions. The values are anonymous functions.
my %env;
my %functions = (
add => sub { $_[0] + $_[1] },
mul => sub { $_[0] * $_[1] },
say => sub { say $_[0] },
bye => sub { exit 0 },
);
Now comes our read-eval-loop (we don't print by default). The readline operator <> will read from the file specified as the first command line argument, or from STDIN if no filename is provided.
while (<>) {
next if /^\s*\#/; # jump comment lines
# parse the line. We get a destination $var, a $command, and any number of #args
my ($var, $command, #args) = parse($_);
# Execute the anonymous sub specified by $command with the #args
my $value = $functions{ $command }->(#args);
# Store the return value if a destination $var was specified
$env{ $var } = $value if defined $var;
}
That was fairly trivial. Now comes some parsing code. Perl “binds” regexes to strings with the =~ operator. Regexes may look like /foo/ or m/foo/. The /x flags allows us to include whitespace in our regex that doesn't match actual whitespace. The /g flag matches globally. This also enables the \G assertion. This is where the last successful match ended. The /c flag is important for this m//gc style parsing to consume one match at a time, and to prevent the position of the regex engine in out string to being reset.
sub parse {
my ($line) = #_; # get the $line, which is a argument
my ($var, $command, #args); # declare variables to be filled
# Test if this statement has a variable declaration
if ($line =~ m/\G\s* \$(\w+) \s*=\s* /xgc) {
$var = $1; # assign first capture if successful
}
# Parse the function of this statement.
if ($line =~ m/\G\s* (\w+) \s*/xgc) {
$command = $1;
# Test if the specified function exists in our %functions
if (not exists $functions{$command}) {
die "The command $command is not known\n";
}
} else {
die "Command required\n"; # Throw fatal exception on parse error.
}
# As long as our matches haven't consumed the whole string...
while (pos($line) < length($line)) {
# Try to match variables
if ($line =~ m/\G \$(\w+) \s*/xgc) {
die "The variable $1 does not exist\n" if not exists $env{$1};
push #args, $env{$1};
}
# Try to match strings
elsif ($line =~ m/\G "([^"]+)" \s*/xgc) {
push #args, $1;
}
# Try to match ints or floats
elsif ($line =~ m/\G (\d+ (?:\.\d+)? ) \s*/xgc) {
push #args, 0+$1;
}
# Throw error if nothing matched
else {
die "Didn't understand that line\n";
}
}
# return our -- now filled -- vars.
return $var, $command, #args;
}
Perl arrays can be handled like linked list: shift removes and returns the first element (pop does the same to the last element). push adds an element to the end, unshift to the beginning.
Out little programming language can execute simple programs like:
#!my_little_language
$a = mul 2 20
$b = add 0 2
$answer = add $a $b
say $answer
bye
If (1) our perl script is saved in my_little_language, set to be executable, and is in the system PATH, and (2) the above file in our little language saved as meaning_of_life.mll, and also set to be executable, then
$ ./meaning_of_life
should be able to run it.
Output is obviously 42. Note that our language doesn't yet have string manipulation or simple assignment to variables. Also, it would be nice to be able to call functions with the return value of other functions directly. This requires some sort of parens, or precedence mechanism. Also, the language requires better error reporting for batch processing (which it already supports).

Why doesn't my subroutine return value get assigned to $_ default variable

I have a Perl subroutine which updates an RSS feed. I want to test the returned value, but the function is used in many places so I wanted to just test the default variable $_ which as far as I understand should be the assigned the return value if no variable is specified.
The code is a bit too long to include all of it, but in essence it does the following
sub updateFeed {
#....
if($error) {
return 0;
}
return 1;
}
Why then does
$rtn = updateFeed("My message");
if ($rtn < 1) { &Log("updateFeed Failed with error $rtn"); }
NOT log any error
whereas
updateFeed("myMessage");
if ($_ < 1) { &Log("updateFeed Failed with error $_"); }
logs an error of "updateFeed Failed with error"? (Note no value at the end of the message.)
Can anyone tell me why the default variable seems to contain an empty string or undef?
Because Perl doesn't work that way. $_ doesn't automatically get the result of functions called in void context. There are some built-in operators that read and write $_ and #_ by default, but your own subroutines will only do that if you write code to make it happen.
An ordinary function call is not one of the contexts in which $_ is used implicitly.
Here's what perldoc perlvar (as of v5.14.1) has to say about $_:
$_
The default input and pattern-searching space. The following pairs are equivalent:
while (<>) {...} # equivalent only in while!
while (defined($_ = <>)) {...}
/^Subject:/
$_ =~ /^Subject:/
tr/a-z/A-Z/
$_ =~ tr/a-z/A-Z/
chomp
chomp($_)
Here are the places where Perl will assume $_ even if you don't use it:
The following functions use $_ as a default argument:
abs, alarm, chomp, chop, chr, chroot, cos, defined, eval, exp, glob, hex, int, lc, lcfirst, length, log, lstat, mkdir, oct, ord, pos, print, quotemeta, readlink, readpipe, ref, require,
reverse (in scalar context only), rmdir, sin, split (on its second argument), sqrt, stat, study, uc, ucfirst, unlink, unpack.
All file tests (-f, -d) except for -t, which defaults to STDIN. See -X in perlfunc
The pattern matching operations m//, s/// and tr/// (aka y///) when used without an =~ operator.
The default iterator variable in a foreach loop if no other variable is supplied.
The implicit iterator variable in the grep() and map() functions.
The implicit variable of given().
The default place to put an input record when a <FH> operation's result is tested by itself as the sole criterion of a while test. Outside a while test, this will not happen.
As $_ is a global variable, this may lead in some cases to unwanted side-effects. As of perl 5.9.1, you can now use a lexical version of $_ by declaring it in a file or in a block with my.
Moreover, declaring our $_ restores the global $_ in the current scope.
Mnemonic: underline is understood in certain operations.
You never assigned the flag to $_, so why would it contain your flag? It appears to contain an empty string (or perhaps undef, which stringifies to the empty string with a warning).
$_ isn't by set by subroutines in void context by default. It is possible to write your subs to set $_ when is void context. You start by checking the value of wantarray, and set $_ when wantarray is undefined.
sub updateFeed {
...
my $return
...
if($error) {
$return = 0;
}else{
$return = 1;
}
# $return = !$error || 0;
if( defined wantarray ){ # scalar or list context
return $return;
}else{ # void context
$_ = $return;
}
}
I would recommend against doing this as it can be quite a surprise to someone that is using your subroutine. Which can make it harder to debug their program.
About the only time I would do this, is when emulating a built-in subroutine.

Is there an analogue of Ruby gsub method in Perl? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How do I perform a Perl substitution on a string while keeping the original?
How do I do one line replacements in Perl without modifying the string itself? I also want it to be usable inside expressions, much like I can do p s.gsub(/from/, 'to') in Ruby.
All I can think of is
do {my $r = $s; $r =~ s/from/to/; $r}
but sure there is a better way?
Starting on the day you feel comfortable writing use 5.14.0 at the top of all of your programs, you can use the s/foo/bar/r variant of the s/// operator, which returns the changed string instead of modifying the original in place (added in perl 5.13.2).
The solution you found with do is not bad, but you can shorten it a little:
do {(my $r = $s) =~ s/from/to/; $r}
It still reveals the mechanics though. You can hide the implementation, and also apply substitutions to lists by writing a subroutine. In most implementations, this function is called apply which you could import from List::Gen or List::MoreUtils or a number of other modules. Or since it is so short, just write it yourself:
sub apply (&#) { # takes code block `&` and list `#`
my ($sub, #ret) = #_; # shallow copy of argument list
$sub->() for #ret; # apply code to each copy
wantarray ? #ret : pop #ret # list in list context, last elem in scalar
}
apply creates a shallow copy of the argument list, and then calls its code block, which is expected to modify $_. The block's return value is not used. apply behaves like the comma , operator. In list context, it returns the list. In scalar context, it returns the last item in the list.
To use it:
my $new = apply {s/foo/bar/} $old;
my #new = apply {s/foo/bar/} qw( foot fool fooz );
From Perl's docs: Regexp-like operators:
($foo = $bar) =~ s/this/that/g; # copy first, then change would match gsub, while
$bar =~ s/this/that/g; # change would match gsub!

How would I do the equivalent of Prototype's Enumerator.detect in Perl with the least amount of code?

Lately I've been thinking a lot about functional programming. Perl offers quite a few tools to go that way, however there's something I haven't been able to find yet.
Prototype has the function detect for enumerators, the descriptions is simply this:
Enumerator.detect(iterator[, context]) -> firstElement | undefined
Finds the first element for which the iterator returns true.
Enumerator in this case is any list while iterator is a reference to a function, which is applied in turn on each element of the list.
I am looking for something like this to apply in situations where performance is important, i.e. when stopping upon encountering a match saves time by disregarding the rest of the list.
I am also looking for a solution that would not involve loading any extra module, so if possible it should be done with builtins only. And if possible, it should be as concise as this for example:
my #result = map function #array;
You say you don't want a module, but this is exactly what the first function in List::Util does. That's a core module, so it should be available everywhere.
use List::Util qw(first);
my $first = first { some condition } #array;
If you insist on not using a module, you could copy the implementation out of List::Util. If somebody knew a faster way to do it, it would be in there. (Note that List::Util includes an XS implementation, so that's probably faster than any pure-Perl approach. It also has a pure-Perl version of first, in List::Util::PP.)
Note that the value being tested is passed to the subroutine in $_ and not as a parameter. This is a convenience when you're using the first { some condition} #values form, but is something you have to remember if you're using a regular subroutine. Some more examples:
use 5.010; # I want to use 'say'; nothing else here is 5.10 specific
use List::Util qw(first);
say first { $_ > 3 } 1 .. 10; # prints 4
sub wanted { $_ > 4 }; # note we're using $_ not $_[0]
say first \&wanted, 1 .. 10; # prints 5
my $want = \&wanted; # Get a subroutine reference
say first \&$want, 1 .. 10; # This is how you pass a reference in a scalar
# someFunc expects a parameter instead of looking at $_
say first { someFunc($_) } 1 .. 10;
Untested since I don't have Perl on this machine, but:
sub first(\&#) {
my $pred = shift;
die "First argument to "first" must be a sub" unless ref $pred eq 'CODE';
for my $val (#_) {
return $val if $pred->($val);
}
return undef;
}
Then use it as:
my $first = first { sub performing test } #list;
Note that this doesn't distinguish between no matches in the list and one of the elements in the list being an undefined value and having that match.
Just since its not here, a Perl function definition of first that localizes $_ for its block:
sub first (&#) {
my $code = shift;
for (#_) {return $_ if $code->()}
undef
}
my #array = 1 .. 10;
say first {$_ > 5} #array; # prints 6
While it will work fine, I don't advocate using this version, since List::Util is a core module (installed by default), and its implementation of first will usually use the XS version (written in C) which is much faster.