Why does Perl create an empty array reference via grep? - perl

I may be missing something obvious, but I'm at a loss as to why the following Perl creates an array reference via grep, but not sort or another general reference?
print #$arr; # no output
print ref $arr; # no output
print scalar #$arr; # no output
print ref $arr; # no output
print sort #$arr; # no output
print ref $arr; # no output
print grep { 0 } #$arr; # no output
print ref $arr; # ARRAY
I'm probably missing something obvious, or maybe it's just one of those things, but it stumped me and I wondered if anyone knew the answer...
I've tested this on Perl 5.8 and 5.10 and get the same behaviour on both.

The interesting question is why don't any of the others. Dereferencing in lvalue context will autovivify the operand if it's undef.
$ perl -E'#$arr = "b"; say $arr // "[undef]"'
ARRAY(0x335b558)
Arguments are always passed by reference to subs, so they are evaluated in lvalue context.
$ perl -E'sub f { } f( #$arr ); say $arr // "[undef]"'
ARRAY(0x284e9f8)
But the "functions" in perlfunc are actually operators, and as such, they get to invent their own syntax and calling conventions. Perl knows that sort won't modify its operands when using the default compare function, so it doesn't evaluate them in lvalue context.
$ perl -E'sort #$arr; say $arr // "[undef]"'
[undef]
grep aliases $_ to each item passed to it, so it its arguments can be modified (even though that's usually not a good idea), so its arguments are evaluated in lvalue context.
$ perl -E'#a = "a"; grep { $_ = uc($_) } #a; say #a'
A

Related

Perl argument parsing into functions when no parentheses are used

I have the following test code:
sub one_argument {
($a) = #_;
print "In one argument: \$a = $a\n";
return "one_argument";
}
sub mul_arguments {
(#a) = #_;
return "mul_argument";
}
print &one_argument &mul_arguments "something", "\n";
My goal is to be able to understand a bit better how perl decides which arguments to go into each function, and to possibly clear up any misunderstandings that I might have. I would've expected the above code to output:
In one argument: mul_argument
one_argument
However, the below is output:
Use of uninitialized value $a in concatenation (.) or string at ./test.pl line 5.
In one argument: $a =
mdd_argument
I don't understand where 'mdd_argument' comes from (Is it a sort of reference to a function?), and why one_argument receives no arguments.
I would appreciate any insight as to how perl parses arguments into functions when they are called in a similar fashion to above.
Please note that this is purely a learning exercise, I don't need the above code to perform as I expected, and in my own code I wouldn't call a function in such a way.
perldoc perlsub:
If a subroutine is called using the & form, the argument list is optional, and if omitted, no #_ array is set up for the subroutine: the #_ array at the time of the call is visible to subroutine instead. This is an efficiency mechanism that new users may wish to avoid.
In other words, in normal usage, if you use the &, you must use parentheses. Otherwise, the subroutine will be passed the caller's #_.
The mysterious "mdd" is caused because &one_argument doesn't have any arguments and perl is expecting an operator to follow it, not an expression. So the & of &mul_arguments is actually interpreted as the stringwise bit and operator:
$ perl -MO=Deparse,-p -e 'sub mul_arguments; print &one_argument &mul_arguments "something", "\n"'
print((&one_argument & mul_arguments('something', "\n")));
and "one_argument" & "mul_arguments" produces "mdd_argument".

Strange perl code - looking for explanation

I know a little bit perl, but not enough deeply to understand the next.
Reading perldelta 5.18 i found the next piece of code what is already disabled in 5.18. Not counting this, still want understand how it's works.
Here is the code and in the comments are what i understand
%_=(_,"Just another "); #initialize the %_ hash with key=>value _ => 'Just another'
$_="Perl hacker,\n"; #assign to the $_ variable with "Perl..."
s//_}->{_/e; # darkness. the /e - evauates the expression, but...
print
it prints:
Just another Perl hacker,
I tried, the perl -MO=Deparse and get the next
(%_) = ('_', 'Just another '); #initializing the %_ hash
$_ = "Perl hacker,\n"; # as above
s//%{'_';}/e; # substitute to the beginning of the $_ - WHAT?
print $_; # print the result
japh syntax OK
What is strange (at least for me) - running the "deparsed" code doesn't gives the original result and prints:
1/8Perl hacker,
I would be very happy:
if someone can explain the code, especially if someone could write an helper code, (with additional steps) what helps me understand how it is works - what happens.
explain, why the deparsed code not prints the original result.
What means the %{'_';} in the deparsed code?
The code actually executed by the substitution operator is probably actually something like
my $code = "do { $repl_expr }";
So when the replacement expression is _}->{_, the following is executed:
do { _}->{_ }
_ simply returns the string _ (since strict is off), so that's the same as
do { "_" }->{_}
which is the same as
"_"->{_}
What you have there is a hash element dereference, where the reference is a symbolic reference (i.e. a string rather than an actual reference). Normally forbidden by strict, here's an example of a symbolic reference at work:
%h1 = ( id => 123 );
%h2 = ( id => 456 );
print "h$_"->{id}, "\n"
for 1..2;
So that means
"_"->{_} # Run-time symbol lookup
is the same as
$_{_} # Compile-time symbol lookup
A similar trick is often used in one-liners.
perl -nle'$c += $_; END { print $c }'
can be shortened to
perl -nle'$c += $_; }{ print $c'
Because the code actually executed when -n is used is obtained from something equivalent to
my $code = "LINE: while (<>) { $program }";
%{'_';}
is a very weird way to write
%{'_'}
which is a hash dereference. Again, the reference here is a symbolic reference. It's equivalent to
%_
In scalar context, hash current returns a value that reports some information about that hash's internals (or a false value if empty). There's been a suggestion to change it to return the number of keys instead.

Why doesn't my subroutine return value get assigned to $_ default variable

I have a Perl subroutine which updates an RSS feed. I want to test the returned value, but the function is used in many places so I wanted to just test the default variable $_ which as far as I understand should be the assigned the return value if no variable is specified.
The code is a bit too long to include all of it, but in essence it does the following
sub updateFeed {
#....
if($error) {
return 0;
}
return 1;
}
Why then does
$rtn = updateFeed("My message");
if ($rtn < 1) { &Log("updateFeed Failed with error $rtn"); }
NOT log any error
whereas
updateFeed("myMessage");
if ($_ < 1) { &Log("updateFeed Failed with error $_"); }
logs an error of "updateFeed Failed with error"? (Note no value at the end of the message.)
Can anyone tell me why the default variable seems to contain an empty string or undef?
Because Perl doesn't work that way. $_ doesn't automatically get the result of functions called in void context. There are some built-in operators that read and write $_ and #_ by default, but your own subroutines will only do that if you write code to make it happen.
An ordinary function call is not one of the contexts in which $_ is used implicitly.
Here's what perldoc perlvar (as of v5.14.1) has to say about $_:
$_
The default input and pattern-searching space. The following pairs are equivalent:
while (<>) {...} # equivalent only in while!
while (defined($_ = <>)) {...}
/^Subject:/
$_ =~ /^Subject:/
tr/a-z/A-Z/
$_ =~ tr/a-z/A-Z/
chomp
chomp($_)
Here are the places where Perl will assume $_ even if you don't use it:
The following functions use $_ as a default argument:
abs, alarm, chomp, chop, chr, chroot, cos, defined, eval, exp, glob, hex, int, lc, lcfirst, length, log, lstat, mkdir, oct, ord, pos, print, quotemeta, readlink, readpipe, ref, require,
reverse (in scalar context only), rmdir, sin, split (on its second argument), sqrt, stat, study, uc, ucfirst, unlink, unpack.
All file tests (-f, -d) except for -t, which defaults to STDIN. See -X in perlfunc
The pattern matching operations m//, s/// and tr/// (aka y///) when used without an =~ operator.
The default iterator variable in a foreach loop if no other variable is supplied.
The implicit iterator variable in the grep() and map() functions.
The implicit variable of given().
The default place to put an input record when a <FH> operation's result is tested by itself as the sole criterion of a while test. Outside a while test, this will not happen.
As $_ is a global variable, this may lead in some cases to unwanted side-effects. As of perl 5.9.1, you can now use a lexical version of $_ by declaring it in a file or in a block with my.
Moreover, declaring our $_ restores the global $_ in the current scope.
Mnemonic: underline is understood in certain operations.
You never assigned the flag to $_, so why would it contain your flag? It appears to contain an empty string (or perhaps undef, which stringifies to the empty string with a warning).
$_ isn't by set by subroutines in void context by default. It is possible to write your subs to set $_ when is void context. You start by checking the value of wantarray, and set $_ when wantarray is undefined.
sub updateFeed {
...
my $return
...
if($error) {
$return = 0;
}else{
$return = 1;
}
# $return = !$error || 0;
if( defined wantarray ){ # scalar or list context
return $return;
}else{ # void context
$_ = $return;
}
}
I would recommend against doing this as it can be quite a surprise to someone that is using your subroutine. Which can make it harder to debug their program.
About the only time I would do this, is when emulating a built-in subroutine.

How to determine if Java process is running in Perl

I have a suite of small Java app that all compiles/packages to <name-of-the-app>.jar and run on my server. Occasionally one of them will throw an exception, choke and die. I am trying to write a quick-n-dirty Perl script that will periodically poll to see if all of these executable JARs are still running, and if any of them are not, send me an email and notify me which one is dead.
To determine this manually, I have to run a ps -aef | grep <name-of-app> for each app I want to check. For example, to see if myapp.jar is running as a process, I run ps -aef | grep myapp, and look for a grep result that describes the JVM process representing myapp.jar. This manual checking is now getting tedious and is a prime candidate for automation!
I am trying to implement the code that checks to see whether a process is running or not. I'd like to make this a sub that accepts the name of the executable JAR and returns true or false:
sub isAppStillRunning($appName) {
# Somehow run "ps -aef | grep $appName"
# Somehow count the number of processes returned by the grep
# Since grep always returns itself, determine if (count - 1) == 1.
# If so, return true, otherwise, false.
}
I need to be able to pass the sub the name of an app, run my normal command, and count the number of results returned by grep. Since running a grep always results in at least one result (the grep command itself), I need logic that says if the (# of results - 1) is equal to 1, then we know the app is running.
I'm brand new to Perl and am having a tough time figuring out how to implement this logic. Here's my best attempt so far:
sub isAppStillRunning($appName) {
# Somehow run "ps -aef | grep $appName"
#grepResults = `ps -aef | grep $appName`;
# Somehow count the number of processes returned by the grep
$grepResultCount = length(#grepResults);
# Since grep always returns itself, determine if (count - 1) == 1.
# If so, return true, otherwise, false.
if(($grepResultCount - 1) == 1)
true
else
false
}
Then to call the method, from inside the same Perl script, I think I would just run:
&isAppStillRunning("myapp");
Any help with defining the sub and then calling it with the right app name is greatly appreciated. Thanks in advance!
It would be about a billion times easier to use the Proc::ProcessTable module from CPAN. Here's an example of what it might look like:
use strict;
use warnings;
use Proc::ProcessTable;
...
sub isAppStillRunning {
my $appname = shift;
my $pt = Proc::ProcessTable->new;
my #procs = grep { $_->fname =~ /$appname/ } #{ $pt->table };
if ( #procs ) {
# we've got at least one proc matching $appname. Hooray!
} else {
# everybody panic!
}
}
isAppStillRUnning( "myapp" );
Some notes to keep in mind:
Turn on strict and warnings. They are your friends.
You don't specify subroutine arguments with prototypes. (Prototypes in Perl do something completely different, which you don't want.) Use shift to get arguments off the #_ array.
Don't use & to call subroutines; just use its name.
An array evaluated in scalar context (including if its inside an if) gives you its size. length doesn't work on arrays.
Your sub is almost there, but the final if-else construct has to be corrected, and in some cases Perl idiom can make your life easier.
Perl Has Prototypes, But They Suck
sub isAppStillRunning($appName) {
will not work. Instead use
sub isAppStillRunning {
my ($appName) = #_;
The #_ array holds the arguments to the function.
Perl has some simple prototypes (the sub name(&$#) {...} syntax), but they are broken, and an advanced topic, so don't use them.
Perl Has Built-In Grep
`ps -aef | grep $appName`;
This returns one (1) string, possibly containing multiple lines. You could split the output at newlines, and grep manually, which is safer than interpolating variables:
my #lines = split /\n/ `ps -aef`;
my #grepped = grep /$appName/, #lines;
You could also use the open function to explicitly open a pipe to ps:
my #grepped = ();
open my $ps, '-|', 'ps -aef' or die "can't invocate ps";
while (<$ps>) {
push #grepped if /$appName/;
}
This is exactly equal, but better style. It reads all lines from the ps output and then pushes all lines with your $appName into the #grepped array.
Scalar vs. List Context
Perl has this unusual thing called "context". There is list context and scalar context. For example, subroutine calls take argument lists - so these lists (usually) have list context. Concatenating two strings is a scalar context, in contrast.
Arrays behave differently depending on their context. In list context, they evaluate to their elements, but in scalar context, they evaluate to the number of their elements. So there is no need to manually count elements (or use the length function that works on strings).
So we have:
my #array = (1, 2, 3);
print "Length: ", scalar(#array), "\n"; # prints "Length: 3\n"
print "Elems: ", #array, "\n"; # prints "Elems: 123\n";
print "LastIdx: ", $#array, "\n"; # prints "LastIdx: 2\n";
The last form, $#array, is the last index in the array. Unless you meddle with special variables, this is the same as #array - 1.
The scalar function forces scalar context.
Perl Has No Booleans
Perl has no boolean data type, and therefore no true or false keywords. Instead, all values are true, unless stated otherwise. False values are:
The empty string "", the number zero 0, the string zero "0", the special value undef, and some other oddities you won't run into.
So generally use 1 as true and 0 as false.
The if/else constructs require curly braces
So you probably meant:
if (TEST) {
return 1;
} else {
return 0;
}
which is the same as return TEST, where TEST is a condition.
The Ultimate reduction
Using these tricks, your sub could be written as short as
sub isAppStillRunning {
return scalar grep /$_[0]/, (split /\n/, `ps -aef`);
}
This returns the number of lines that contain your app name.
You could modify your routine like this:
sub isAppRunning {
my $appName = shift;
#grepResults = `ps -aef | grep $appName`;
my $items = 0;
for $item(#grepResults){
$items++;
}
return $items;
}
This will iterate over the #grepResults and allow you to inspect the $item if necessary.
Calling it like this should return the number of processes:
print(isAppRunning('myapp') . "\n");

Where is $_ being modified in this perl code?

The following perl code generates a warning in PerlCritic (by Activestate):
sub natural_sort {
my #sorted;
#sorted = grep {s/(^|\D)0+(\d)/$1$2/g,1} sort grep {s/(\d+)/sprintf"%06.6d",$1/ge,1} #_;
}
The warning generated is:
Don't modify $_ in list functions
More info about that warning here
I don't understand the warning because I don't think I'm modifying $_, although I suppose I must be.
Can someone explain it to me please?
Both of your greps are modifying $_ because you're using s//. For example, this:
grep {s/(^|\D)0+(\d)/$1$2/g,1}
is the same as this:
grep { $_ =~ s/(^|\D)0+(\d)/$1$2/g; 1 }
I think you'd be better off using map as you are not filtering anything with your greps, you're just using grep as an iterator:
sub natural_sort {
my $t;
return map { ($t = $_) =~ s/(^|\D)0+(\d)/$1$2/g; $t }
sort
map { ($t = $_) =~ s/(\d+)/sprintf"%06.6d",$1/ge; $t }
#_;
}
That should do the same thing and keep critic quiet. You might want to have a look at List::MoreUtils if you want some nicer list operators than plain map.
You are doing a substitution ( i.e. s/// ) in the grep, which modifies $_ i.e. the list being grepped.
This and other cases are explained in perldoc perlvar:
Here are the places where Perl will
assume $_ even if you don't use it:
The following functions:
abs, alarm, chomp, chop, chr, chroot,
cos, defined, eval, exp, glob, hex,
int, lc, lcfirst, length, log, lstat,
mkdir, oct, ord, pos, print,
quotemeta, readlink, readpipe, ref,
require, reverse (in scalar context
only), rmdir, sin, split (on its
second argument), sqrt, stat, study,
uc, ucfirst, unlink, unpack.
All file tests (-f , -d ) except for -t , which defaults to STDIN.
See -X
The pattern matching operations m//, s/// and tr/// (aka y///) when
used without an =~ operator.
The default iterator variable in a foreach loop if no other variable is
supplied.
The implicit iterator variable in the grep() and map() functions.
The implicit variable of given().
The default place to put an input record when a operation's result
is tested by itself as the sole
criterion of a while test. Outside a
while test, this will not happen.
Many people have correctly answered that the s operator is modifying $_, however in the soon to be released Perl 5.14.0 there will be the new r flag for the s operator (i.e. s///r) which rather than modify in-place will return the modified elements. Read more at The Effective Perler . You can use perlbrew to install this new version.
Edit: Perl 5.14 is now available! Announcement Announcement Delta
Here is the function suggested by mu (using map) but using this functionality:
use 5.14.0;
sub natural_sort {
return map { s/(^|\D)0+(\d)/$1$2/gr }
sort
map { s/(\d+)/sprintf"%06.6d",$1/gre }
#_;
}
The VERY important part that other answers have missed is that the line
grep {s/(\d+)/sprintf"%06.6d",$1/ge,1} #_;
Is actually modifying the arguments passed into the function, and not copies of them.
grep is a filtering command, and the value in $_ inside the code block is an alias to one of the values in #_. #_ in turn contains aliases to the arguments passed to the function, so when the s/// operator performs its substitution, the change is being made to the original argument. This is shown in the following example:
sub test {grep {s/a/b/g; 1} #_}
my #array = qw(cat bat sat);
my #new = test #array;
say "#new"; # prints "cbt bbt sbt" as it should
say "#array"; # prints "cbt bbt sbt" as well, which is probably an error
The behavior you are looking for (apply a function that modifies $_ to a copy of a list) has been encapsulated as the apply function in a number of modules. My module List::Gen contains such an implementation. apply is also fairly simple to write yourself:
sub apply (&#) {
my ($sub, #ret) = #_;
$sub->() for #ret;
wantarray ? #ret : pop #ret
}
With that, your code could be rewritten as:
sub natural_sort {
apply {s/(^|\D)0+(\d)/$1$2/g} sort apply {s/(\d+)/sprintf"%06.6d",$1/ge} #_
}
If your goal with the repeated substitutions is to perform a sort of the original data with a transient modification applied, you should look into a Perl idiom known as the Schwartzian transform which is a more efficient way of achieving that goal.