redefining or overloading backticks - perl

I have a lot of legacy code which shells out a lot, what i want to do is add a require or minimal code changes to make the backticks do something different, for instance print instead of running code
i tried using use subs but i couldn't get it to take over backticks or qx (i did redefine system which is one less thing to worry about)
i also tried to make a package:
package thingmbob;
use Data::Dumper;
use overload '``' => sub { CORE::print "things!:\t", Dumper \#_};
#this works for some reason
$thingmbob::{'(``'}('ls');
#this does the standard backtick operation
`ls`
unfourtunatly, I have no experience in OOP perl and my google-fu skills are failing me, could some one point me in the right direction?
caveat:
I'm in a closed system with a few cpan modules preinstalled, odds are that i don't have any fancy modules preinstalled and i absolutely cannot get new ones
I'm on perl5.14
edit:
for the sake of completeness i want to add my (mostly) final product
BEGIN {
*CORE::GLOBAL::readpipe = sub {
print Dumper(\#_);
#internal = readpipe(#_);
if(wantarray){
return #internal;
}else{
return join('',#internal);
}
};
}
I want it to print what its about to run and then run it. the wantarray is important because without it scalar context does not work

This perlmonks article explains how to do it. You can overwrite the readpipe built-in.
EXPR is executed as a system command. The collected standard output of the command is returned. In scalar context, it comes back as a single (potentially multi-line) string. In list context, returns a list of lines (however you've defined lines with $/ (or $INPUT_RECORD_SEPARATOR in English)). This is the internal function implementing the qx/EXPR/ operator, but you can use it directly. The qx/EXPR/ operator is discussed in more detail in I/O Operators in perlop. If EXPR is omitted, uses $_ .
You need to put this into a BEGIN block, so it would make sense to not require, but use it instead to make it available as early as possible.
Built-ins are overridden using the CORE::GLOBAL:: namespace.
BEGIN {
*CORE::GLOBAL::readpipe = sub {
print "#_";
}
}
print qx/ls/;
print `ls`;
This outputs:
ls1ls1
Where the ls is the #_ and the 1 is the return value of print inside the overridden sub.
Alternatively, there is ex::override, which does the same under the hood, but with less weird internals.

Related

In perl, what does a parenthesized list of '$' mean in a sub declaration?

I have to debug someone else's code and ran across sub declarations that look like this...
sub mysub($$$$) {
<code here>
}
...also...
sub mysub($$$;$) {
<code here>
}
What does the parenthesized list of '$' (with optional ';') mean?
I ran an experiment and it doesn't seem to care if I pass more and fewer args to a sub declared this way than there are '$' in the list. I was thinking that it might be used to disambiguate two different subs with the same name, differring only by the number of args pased to it (as defined by the ($$$$) vs ($$$) vs ($$) etc... ). But that doesn't seem to be it.
That's a Perl subroutine prototype. It's an old-school way of letting the parser know how many arguments to demand. Unless you know what they are going to do for you, I suggest you avoid these for any new code. If you can avoid prototypes, avoid it. It doesn't gain you as much as you think. There's a newer but experimental way to do it better.
The elements after the ; are optional arguments. So, mysub($$$$) has four mandatory arguments, and mysub($$$;$) has three mandatory arguments and one optional argument.
A little about parsing
Perl lets you be a bit loose about parentheses when you want to specify arguments, so these are the same:
print "Hello World";
print( "Hello World\n" );
This is one of Perl's philosophical points. When we can omit boilerplate, we should be able to.
Also, Perl lets you pass as many arguments as you like to a subroutine and you don't have to say anything about parameters ahead of time:
sub some_sub { ... }
some_sub( 1, 2, 3, 4 );
some_sub 1, 2, 3, 4; # same
This is another foundational idea of Perl: we have scalars and lists. Many things work on a list, and we don't care what's in it or how many elements it has.
But, some builtins take a definite number of arguments. The sin takes exactly one argument (but print takes zero to effectively infinity):
print sin 5, 'a'; # -0.958924274663138a (a is from `a`)
The rand takes zero or one:
print rand; # 0.331390818188996
print rand 10; # 4.23956650382937
But then, you can define your own subroutines. Prototypes are a way to mimic that same behavior you see in the builtins (which I think is kinda cool but also not as motivating for production situations).
I tend to use parens in argument lists because I find it's easier for people to see what I intend (although not always with print, I guess):
print sin(5), 'a';
There's one interesting use of prototypes that I like. You can make your own syntax that works like map and grep block forms:
map { ... } #array;
If you want to play around with that (but still not subject maintenance programmers to it), check out Object::Iterate for a demonstration of it.
Experimental signatures
Perl v5.20 introduced an experimental signatures feature where you can give names to parameters. All of these are required:
use v5.20;
use feature qw(signatures);
sub mysub ( $name, $address, $phone ) { ... }
If you wanted an optional parameter, you can give it a default value:
sub mysub ( $name, $address, $phone = undef ) { ... }
Since this is an experimental feature, it warns whenever you use it. You can turn it off though:
no warnings qw(experimental::signatures);
This is interesting.
I ran an experiment and it doesn't seem to care if I pass more and fewer args to a sub declared this way than there are '$' in the list.
Because, of course, that's exactly what the code's author was trying to enforce.
There are two ways to circumvent the parameter counting that prototypes are supposed to enforce.
Call the subroutine as a method on an object ($my_obj->my_sub(...)) or on a class (MyClass->my_sub(...)).
Call the subroutine using the "old-style" ampersand syntax (&my_sub(...)).
From which we learn:
Don't use prototypes on subroutines that are intended to be used as methods.
Don't use the ampersand syntax for calling subroutines.

What is wrong in "my $foo = $x if $y" syntax?

In my last question here, #amon gave an great answer. However, he told too:
First of all, please don't do my $foo = $x if $y. You get unexpected
and undefined behavior, so it is best to avoid that syntax.
Because the above construction I was see in really many sources in the CPAN, I'm wondering how, when, where can be it wrong. (Some example code would be nice). Wondering too, why perl allows it, if it is bad.
His wording was actually a bit laxer. That wording is actually mine. Let's start with the documentation: (Emphasis in original)
NOTE: The behaviour of a my, state, or our modified with a statement modifier conditional or loop construct (for example, my $x if ...) is undefined. The value of the my variable may be undef, any previously assigned value, or possibly anything else. Don't rely on it. Future versions of perl might do something different from the version of perl you try it out on. Here be dragons.
To be more precise, the problem is using a lexical variable when its my may not have been executed.
Consider:
# Usage:
# f($x) # Store a value
# f() # Fetch and clear the stored value
sub f {
my $x if !#_;
if (#_) {
$x = $_[0];
} else {
return $x;
}
}
f('abc');
say "<", f(), ">" # abc
This is obviously not the documented behaviour of my.
Because the above construction I was see in really many sources in the CPAN
That code is buggy. If you want a value to persist between calls to a sub, you can use a state variable since Perl 5.10, or a variable outside of the sub.

Perl subroutine arguments

I have been reading about Perl recently and am slightly perplexed about how Perl handles arguments passed to subroutines.
In a language like Python, Java or PHP, a function definition takes the form (in pseudocode):
function myFunc(arg1, arg2) {
// Do something with arg1 and arg2 here
}
Yet in Perl it's just:
sub mySub {
# #_ holds all arguments passed
}
And as I understand it, that's the only way to do it.
What if I want to restrict the caller to only pass two arguments?
Isn't this just Perl not allowing anything but variable-number arguments in other languages (i.e., Python, C, etc.)?
Wouldn't that become a problem at some point?
What about all the default argument-number checking in other languages? Would one have to do that explicitly in Perl? For instance
sub a_sub {
if (#_ == 2) {
# Continue function
}
else {
return false
}
}
You are wary of the Perl environment because it is quite different from the languages you have come across before.
The people who believe in strong typing and function prototypes will disagree here, but I believe that restrictions like that are rarely useful. Has C really caught you passing the wrong number of parameters to a function often enough to be useful?
It is most common in modern Perl to copy the contents of #_ to a list of lexical scalar variables, so you will often see subroutines starting with
sub mysub {
my ($p1, $p2) = #_;
... etc.
}
that way, all parameters that are passed will be available as elements of #_ ($_[0], $_[1] etc.) while the expected ones are named and appear in $p1 and $p2 (although I hope you understand that those names should be chosen appropriately).
In the particular case that the subroutine is a method, the first parameter is special. In other languages it is self or this, but in Perl it is simply the first parameter in #_ and you may call it what you like. In those circumstances you would see
sub method {
my $self = shift;
my ($p1, $p2) = #_;
... etc.
}
so that the context object (or the name of the class if it is a class method) is extracted into $self (a name assumed by convention) and the rest of the parameters remain in #_ to be accessed either directly or, more usually, copied to local scalar variables as $p1, $p2 etc.
Most often the complaint is that there is no type checking either, so I can pass any scalar I like as a subroutine parameter. As long as use strict and use warnings are in context, even this is generally simple to debug, simply because the operations that the subroutine can perform on one form of scalar are usually illegal on another.
Although it was originally more to do with encapsulation with respect to object-oriented Perl, this quote from Larry Wall is very relevant
Perl doesn't have an infatuation with enforced privacy. It would prefer that you stayed out of its living room because you weren't invited, not because it has a shotgun
C was designed and implemented in the days when it was a major efficiency boost if you could get a faulty program to fail during compilation rather than at run time. That has changed now, although a similar situation has arisen with client-side JavaScript where it actually would be useful to know that the code is wrong before fetching the data from the internet that it has to deal with. Sadly, JavaScript parameter checking is now looser than it should be.
Update
For those who doubt the usefulness of Perl for teaching purposes, I suggest that it is precisely because Perl's mechanisms are so simple and direct that they are ideal for such purposes.
When you call a Perl subroutine all of the parameters in the call are aliased in #_. You can use them directly to affect the actual parameters, or copy them to prevent external action
If you call a Perl subroutine as a method then the calling object or class is provided as the first parameter. Again, the subroutine (method) can do what it likes with #_
Perl doesn't manage your argument handling for you. Instead, it provides a minimal, flexible abstraction and allows you to write code that fits your needs.
Pass By Reference
By default, Perl sticks an alias to each argument in #_. This implements basic, pass by reference semantics.
my $num = 1;
foo($num);
print "$num\n"; # prints 2.
sub foo { $_[0]++ }
Pass by reference is fast but has the risk of leaking changes to parameter data.
Pass By Copy
If you want pass by copy semantics, you need to make the copies yourself. Two main approaches to handling lists of positional parameters are common in the Perl community:
sub shifty {
my $foo = shift;
}
sub listy {
my ($foo) = #_;
}
At my place of employment we do a version of listy:
sub fancy_listy {
my ($positional, $args, #bad) = #_;
die "Extra args" if #bad;
}
Named Parameters
Another common practice is the use of named parameters:
sub named_params {
my %opt = #_;
}
Some people are happy with just the above. I prefer a more verbose approach:
sub named_params {
my %opt = #_;
my $named = delete $opt{named} // "default value";
my $param = delete $opt{param}
or croak "Missing required 'param'";
croak "Unknown params:", join ", ", keys %opt
if %opt;
# do stuff
}
This unpacks named params into variables, allows space for basic validation and default values and enforces that no extra, unknown arguments were passed in.
On Perl Prototypes
Perl's "prototypes" are not prototypes in the normal sense. They only provide compiler hints that allow you to skip parenthesis on function calls. The only reasonable use is to mimic the behavior of built-in functions. You can easily defeat prototype argument checking. In general, DO NOT USE PROTOTYPES. Use them with with care that you would use operator overloading--i.e. sparingly and only to improve readability.
For some reason, Perl likes lists, and dislikes static typing. The #_ array actually opens up a lot of flexibility, because subroutine arguments are passed by reference, and not by value. For example, this allows us to do out-arguments:
my $x = 40;
add_to($x, 2);
print "$x\n"; # 42
sub add_to { $_[0] += $_[1] }
… but this is more of an historic performance hack. Usually, the arguments are “declared” by a list assignment:
sub some_sub {
my ($foo, $bar) = #_;
# ^-- this assignment performs a copy
...
}
This makes the semantics of this sub call-by-value, which is usually more desirable. Yes, unused arguments are simply forgotten, and too few arguments do not raise any automatic error – the variables just contain undef. You can add arbitrary validation e.g. by checking the size of #_.
There exist plans to finally make named parameters available in the future, which would look like
sub some_sub($foo, $bar) { ... }
You can have this syntax today if you install the signatures module. But there is something even better: I can strongly recommend Function::Parameters, which allows syntax like
fun some_sub($foo, $bar = "default value") { ... }
method some_method($foo, $bar, :$named_parameter, :$named_with_default = 42) {
# $self is autodeclared in methods
}
This also supports experimental type checks.
Parser extensions FTW!
If you really want to impose stricter parameter checks in Perl, you could look at something like Params::Validate.
Perl does have the prototyping capability for parameter placeholders, that you're kind of used to seeing, but it's often unnecessary.
sub foo($){
say shift;
};
foo(); # Error: Not enough arguments for main::foo
foo('bar'); # executes correctly
And if you did sub foo($$){...} it would require 2 non-optional arguments (eg foo('bar','baz'))
You can just use:
my ($arg1, $arg2) = #_;
To explicitly limit the number of arguments you can use:
my $number =2;
die "Too many arguments" if #_ > $number;
If you are reading about Perl recently, please read about recent Perl. You may read the Modern Perl book for free as well.
Indeed, while with old standard Perl you would need to restrict the number of arguments passed to a subroutine manually, for example with something like this:
sub greet_one {
die "Too many arguments for subroutine" unless #_ <= 1;
my $name = $_[0] || "Bruce";
say "Hello, $name!";
}
With modern Perl you can take advantage of function signatures.
Here are a few examples from the Modern Perl book:
sub greet_one($name = 'Bruce') {
say "Hello, $name!";
}
sub greet_all($leader, #everyone) {
say "Hello, $leader!";
say "Hi also, $_." for #everyone;
}
sub make_nested_hash($name, %pairs) {
return { $name => \%pairs };
}
Please, note that function signatures were introduced in Perl 5.20 and considered experimental until Perl 5.36.
Therefore, if you use a Perl version in that range you may want to disable warnings for the "experimental::signatures" category:
use feature 'signatures';
no warnings 'experimental::signatures';

perl encapsulate single variable in double quotes

In Perl, is there any reason to encapsulate a single variable in double quotes (no concatenation) ?
I often find this in the source of the program I am working on (writen 10 years ago by people that don't work here anymore):
my $sql_host = "something";
my $sql_user = "somethingelse";
# a few lines down
my $db = sub_for_sql_conection("$sql_host", "$sql_user", "$sql_pass", "$sql_db");
As far as I know there is no reason to do this. When I work in an old script I usualy remove the quotes so my editor colors them as variables not as strings.
I think they saw this somewhere and copied the style without understanding why it is so. Am I missing something ?
Thank you.
All this does is explicitly stringify the variables. In 99.9% of cases, it is a newbie error of some sort.
There are things that may happen as a side effect of this calling style:
my $foo = "1234";
sub bar { $_[0] =~ s/2/two/ }
print "Foo is $foo\n";
bar( "$foo" );
print "Foo is $foo\n";
bar( $foo );
print "Foo is $foo\n";
Here, stringification created a copy and passed that to the subroutine, circumventing Perl's pass by reference semantics. It's generally considered to be bad manners to munge calling variables, so you are probably okay.
You can also stringify an object or other value here. For example, undef stringifies to the empty string. Objects may specify arbitrary code to run when stringified. It is possible to have dual valued scalars that have distinct numerical and string values. This is a way to specify that you want the string form.
There is also one deep spooky thing that could be going on. If you are working with XS code that looks at the flags that are set on scalar arguments to a function, stringifying the scalar is a straight forward way to say to perl, "Make me a nice clean new string value" with only stringy flags and no numeric flags.
I am sure there are other odd exceptions to the 99.9% rule. These are a few. Before removing the quotes, take a second to check for weird crap like this. If you do happen upon a legit usage, please add a comment that identifies the quotes as a workable kludge, and give their reason for existence.
In this case the double quotes are unnecessary. Moreover, using them is inefficient as this causes the original strings to be copied.
However, sometimes you may want to use this style to "stringify" an object. For example, URI ojects support stringification:
my $uri = URI->new("http://www.perl.com");
my $str = "$uri";
I don't know why, but it's a pattern commonly used by newcomers to Perl. It's usually a waste (as it is in the snippet you posted), but I can think of two uses.
It has the effect of creating a new string with the same value as the original, and that could be useful in very rare circumstances.
In the following example, an explicit copy is done to protect $x from modification by the sub because the sub modifies its argument.
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f($x);
say $x;
'
Abc
Abc
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f("$x");
say $x;
'
Abc
abc
By virtue of creating a copy of the string, it stringifies objects. This could be useful when dealing with code that alters its behaviour based on whether its argument is a reference or not.
In the following example, explicit stringification is done because require handles references in #INC differently than strings.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib $lib;
use DBI;
say "ok";
'
Can't locate object method "INC" via package "Path::Class::Dir" at -e line 4.
BEGIN failed--compilation aborted at -e line 4.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib "$lib";
use DBI;
say "ok";
'
ok
In your case quotes are completely useless. We can even says that it is wrong because this is not idiomatic, as others wrote.
However quoting a variable may sometime be necessary: this explicitely triggers stringification of the value of the variable. Stringification may give a different result for some values if thoses values are dual vars or if they are blessed values with overloaded stringification.
Here is an example with dual vars:
use 5.010;
use strict;
use Scalar::Util 'dualvar';
my $x = dualvar 1, "2";
say 0+$x;
say 0+"$x";
Output:
1
2
My theory has always been that it's people coming over from other languages with bad habits. It's not that they're thinking "I will use double quotes all the time", but that they're just not thinking!
I'll be honest and say that I used to fall into this trap because I came to Perl from Java, so the muscle memory was there, and just kept firing.
PerlCritic finally got me out of the habit!
It definitely makes your code more efficient, but if you're not thinking about whether or not you want your strings interpolated, you are very likely to make silly mistakes, so I'd go further and say that it's dangerous.

How can I make a static analysis call graph for Perl?

I am working on a moderately complex Perl program. As a part of its development, it has to go through modifications and testing. Due to certain environment constraints, running this program frequently is not an option that is easy to exercise.
What I want is a static call-graph generator for Perl. It doesn't have to cover every edge case(e,g., redefining variables to be functions or vice versa in an eval).
(Yes, I know there is a run-time call-graph generating facility with Devel::DprofPP, but run-time is not guaranteed to call every function. I need to be able to look at each function.)
Can't be done in the general case:
my $obj = Obj->new;
my $method = some_external_source();
$obj->$method();
However, it should be fairly easy to get a large number of the cases (run this program against itself):
#!/usr/bin/perl
use strict;
use warnings;
sub foo {
bar();
baz(quux());
}
sub bar {
baz();
}
sub baz {
print "foo\n";
}
sub quux {
return 5;
}
my %calls;
while (<>) {
next unless my ($name) = /^sub (\S+)/;
while (<>) {
last if /^}/;
next unless my #funcs = /(\w+)\(/g;
push #{$calls{$name}}, #funcs;
}
}
use Data::Dumper;
print Dumper \%calls;
Note, this misses
calls to functions that don't use parentheses (e.g. print "foo\n";)
calls to functions that are dereferenced (e.g. $coderef->())
calls to methods that are strings (e.g. $obj->$method())
calls the putt the open parenthesis on a different line
other things I haven't thought of
It incorrectly catches
commented functions (e.g. #foo())
some strings (e.g. "foo()")
other things I haven't thought of
If you want a better solution than that worthless hack, it is time to start looking into PPI, but even it will have problems with things like $obj->$method().
Just because I was bored, here is a version that uses PPI. It only finds function calls (not method calls). It also makes no attempt to keep the names of the subroutines unique (i.e. if you call the same subroutine more than once it will show up more than once).
#!/usr/bin/perl
use strict;
use warnings;
use PPI;
use Data::Dumper;
use Scalar::Util qw/blessed/;
sub is {
my ($obj, $class) = #_;
return blessed $obj and $obj->isa($class);
}
my $program = PPI::Document->new(shift);
my $subs = $program->find(
sub { $_[1]->isa('PPI::Statement::Sub') and $_[1]->name }
);
die "no subroutines declared?" unless $subs;
for my $sub (#$subs) {
print $sub->name, "\n";
next unless my $function_calls = $sub->find(
sub {
$_[1]->isa('PPI::Statement') and
$_[1]->child(0)->isa("PPI::Token::Word") and
not (
$_[1]->isa("PPI::Statement::Scheduled") or
$_[1]->isa("PPI::Statement::Package") or
$_[1]->isa("PPI::Statement::Include") or
$_[1]->isa("PPI::Statement::Sub") or
$_[1]->isa("PPI::Statement::Variable") or
$_[1]->isa("PPI::Statement::Compound") or
$_[1]->isa("PPI::Statement::Break") or
$_[1]->isa("PPI::Statement::Given") or
$_[1]->isa("PPI::Statement::When")
)
}
);
print map { "\t" . $_->child(0)->content . "\n" } #$function_calls;
}
I'm not sure it is 100% feasible (since Perl code can not be statically analyzed in theory, due to BEGIN blocks and such - see very recent SO discussion). In addition, subroutine references may make it very difficult to do even in places where BEGIN blocks don't come into play.
However, someone apparently made the attempt - I only know of it but never used it so buyer beware.
I don't think there is a "static" call-graph generator for Perl.
The next closest thing would be Devel::NYTProf.
The main goal is for profiling, but it's output can tell you how many times a subroutine has been called, and from where.
If you need to make sure every subroutine gets called, you could also use Devel::Cover, which checks to make sure your test-suite covers every subroutine.
I recently stumbled across a script while trying to solve find an answer to this same question. The script (linked to below) uses GraphViz to create a call graph of a Perl program or module. The output can be in a number of image formats.
http://www.teragridforum.org/mediawiki/index.php?title=Perl_Static_Source_Code_Analysis
I solved a similar problem recently, and would like to share my solution.
This tool was born out of desperation, untangling an undocumented part of a 30,000-line legacy script, in order to implement an urgent bug fix.
It reads the source code(s), uses GraphViz to generate a png, and then displays the image on-screen.
Since it uses simple line-by-line regexes, the formatting must be "sane" so that nesting can be determined.
If the target code is badly formatted, run it through a linter first.
Also, don't expect miracles such as parsing dynamic function calls.
The silver lining of a simple regex engine is that it can be easily extended for other languages.
The tool now also supports awk, bash, basic, dart, fortran, go, lua, javascript, kotlin, matlab, pascal, perl, php, python, r, raku, ruby, rust, scala, swift, and tcl.
https://github.com/koknat/callGraph