Use data structure references more commonly - perl

I have been reading some of the perl513*delta files and I have seen some of the new features coming to Perl 5.14. Beginning with Perl 5.13.7 many of the array/hash functions will work on array/hash refs as well. While this probably is seen mostly as syntactic sugar, or Perl doing what you expect, I wonder, will/should this change the paradigm of declaring data structures in Perl? With the known caveat that it breaks compatibility with eariler Perl's, what would be the arguements for and against using anonymous structures primarily?
For example:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.13.7;
my $hashref = {
english => 'hello',
spanish => 'hola',
french => 'bon jour'
};
foreach my $greeting (keys $hashref) {
say $hashref->{$greeting}; #use say since we need a later version anyway
}
rather than the more traditional way using a named hash (%hash).
P.S. If this is seen are augmentative I can change to CW, but I am curious to hear some perspectives.

The ability to use certain array and hash functions on references is just syntactic sugar and need not impact the way you work with first level plural structures. There are several reasons for this:
given my $array = [1 .. 10]
List processing functions like map, grep, sort, reverse, print, say, printf and many others still need to be passed proper lists, so this means using #$array vs the simpler #array with these functions.
The for/foreach loop needs to be passed a list, requiring #$array
$array is always true, to determine the length you need to write #$array
while ($array) { infinite loop }
while (#$array) { what you probably wanted }
while (#array) { no room for error here }
sub-scripting a real #array as $array[$idx] is marginally faster (~15%) than $array->[$idx] since a dereference does not need to happen on each access. The difference with hashes is smaller, around 3%, due to the overhead of the hashing function.
Basically, by moving to all references, you get a different set of functionality that needs to use the dereferencing sigils. Instead, take advantage of pre v5.13.7 functionality for anything you are declaring for immediate use my #array; my %hash; and utilize the new syntax shortcuts in areas where you would have used excessive #{ ... } or %{ ... } constructs with the applicable functions.

I don't believe this upcoming change will break backward compatibility. Now you get an error on keys $hashref, with perl 5.14 it will be working. So effectively no current code could be using such feature.

Good syntactic sugar is important. Perl itself is "only" syntactic sugar over C which sugars assembler which sugars machine code.
This will not change how my top level usage per your example, but will help to reduce the awkward syntax found when using complex structures, ie "push #($this->{somekey}), $stuff" becomes "push $this->{somekey}, $stuff".

Related

In perl, what does a parenthesized list of '$' mean in a sub declaration?

I have to debug someone else's code and ran across sub declarations that look like this...
sub mysub($$$$) {
<code here>
}
...also...
sub mysub($$$;$) {
<code here>
}
What does the parenthesized list of '$' (with optional ';') mean?
I ran an experiment and it doesn't seem to care if I pass more and fewer args to a sub declared this way than there are '$' in the list. I was thinking that it might be used to disambiguate two different subs with the same name, differring only by the number of args pased to it (as defined by the ($$$$) vs ($$$) vs ($$) etc... ). But that doesn't seem to be it.
That's a Perl subroutine prototype. It's an old-school way of letting the parser know how many arguments to demand. Unless you know what they are going to do for you, I suggest you avoid these for any new code. If you can avoid prototypes, avoid it. It doesn't gain you as much as you think. There's a newer but experimental way to do it better.
The elements after the ; are optional arguments. So, mysub($$$$) has four mandatory arguments, and mysub($$$;$) has three mandatory arguments and one optional argument.
A little about parsing
Perl lets you be a bit loose about parentheses when you want to specify arguments, so these are the same:
print "Hello World";
print( "Hello World\n" );
This is one of Perl's philosophical points. When we can omit boilerplate, we should be able to.
Also, Perl lets you pass as many arguments as you like to a subroutine and you don't have to say anything about parameters ahead of time:
sub some_sub { ... }
some_sub( 1, 2, 3, 4 );
some_sub 1, 2, 3, 4; # same
This is another foundational idea of Perl: we have scalars and lists. Many things work on a list, and we don't care what's in it or how many elements it has.
But, some builtins take a definite number of arguments. The sin takes exactly one argument (but print takes zero to effectively infinity):
print sin 5, 'a'; # -0.958924274663138a (a is from `a`)
The rand takes zero or one:
print rand; # 0.331390818188996
print rand 10; # 4.23956650382937
But then, you can define your own subroutines. Prototypes are a way to mimic that same behavior you see in the builtins (which I think is kinda cool but also not as motivating for production situations).
I tend to use parens in argument lists because I find it's easier for people to see what I intend (although not always with print, I guess):
print sin(5), 'a';
There's one interesting use of prototypes that I like. You can make your own syntax that works like map and grep block forms:
map { ... } #array;
If you want to play around with that (but still not subject maintenance programmers to it), check out Object::Iterate for a demonstration of it.
Experimental signatures
Perl v5.20 introduced an experimental signatures feature where you can give names to parameters. All of these are required:
use v5.20;
use feature qw(signatures);
sub mysub ( $name, $address, $phone ) { ... }
If you wanted an optional parameter, you can give it a default value:
sub mysub ( $name, $address, $phone = undef ) { ... }
Since this is an experimental feature, it warns whenever you use it. You can turn it off though:
no warnings qw(experimental::signatures);
This is interesting.
I ran an experiment and it doesn't seem to care if I pass more and fewer args to a sub declared this way than there are '$' in the list.
Because, of course, that's exactly what the code's author was trying to enforce.
There are two ways to circumvent the parameter counting that prototypes are supposed to enforce.
Call the subroutine as a method on an object ($my_obj->my_sub(...)) or on a class (MyClass->my_sub(...)).
Call the subroutine using the "old-style" ampersand syntax (&my_sub(...)).
From which we learn:
Don't use prototypes on subroutines that are intended to be used as methods.
Don't use the ampersand syntax for calling subroutines.

Change ref of hash in Perl

I ran into this and couldn't find the answer. I am trying to see if it is possible to "change" the reference of a hash. In other words, I have a hash, and a function that returns a hashref, and I want to make my hash point to the location in memory specified by this ref, instead of copying the contents of the hash it points to. The code looks something like this:
%hash = $h->hashref;
My obvious guess was that it should look like this:
\%hash = $h->hashref;
but that gives the error:
Can't modify reference constructor in scalar assignment
I tried a few other things, but nothing worked. Is what I am attempting actually possible?
An experimental feature which would seemingly allow you to do exactly what you're describing has been added to Perl 5.21.5, which is a development release (see "Aliasing via reference").
It sounds like you want:
use Data::Alias;
alias %hash = $h->hashref;
Or if %hash is a package variable, you can instead just do:
*hash = $h->hashref;
But either way, this should almost always be avoided; simply use the hash reference.
This question is really old, but Perl now allows this sort of thing as an experimental feature:
use v5.22;
use experimental qw(refaliasing);
my $first = {
foo => 'bar',
baz => 'quux',
};
\my %hash = $first;
Create named variable aliases with ref aliasing
Mix assignment and reference aliasing with declared_refs
Yes, but…
References in Perl are scalars. You are trying to alias the return value. This actually is possible, but you should not do this, since it involves messing with the symbol table. Furthermore, this only works for globals (declared with our): If you assign a hashref to the glob *hash it will assign to the symbol table entry %hash:
#!/usr/bin/env perl
use warnings;
use strict;
sub a_hashref{{a => "one", b => "two"}}
our %hash;
*hash = a_hashref;
printf "%3s -> %s\n", $_, $hash{$_} foreach keys %hash;
This is bad style! It isn't in PBP (directly, but consider section 5.1: “non-lexicals should be avoided”) and won't be reported by perlcritic, but you shouldn't pollute the package namespace for a little syntactic fanciness. Furthermore it doesn't work with lexical variables (which is what you might want to use most of the time, because they are lexically scoped, not package wide).
Another problem is, that if the $h->hashref method changes its return type, you'll suddenly assign to another table entry! (So if $h->hashref changes its return type to an arrayref, you assign to #hash, good luck detecting that). You could circumvent that by checking if $h->hashref really returns a hashref with 'HASH' eq ref $h->hashref`, but that would defeat the purpose.
What is the problem with just keeping the reference? If you get a reference, just store it in a scalar:
$hash = $h->hashref
To read more about the global symbol table, take a look at perlmod and consider perlref for the *FOO{THING} syntax, which sadly isn't for lvalues.
To achieve what you want, you could check out the several aliasing modules on cpan. Data::Alias or Lexical::Alias seem to fit your purpose. Also if you are interested in tie semantics and/or don't want to use XS modules, Tie::Alias might be worth a shoot.

The good, the bad, and the ugly of lexical $_ in Perl 5.10+

Starting in Perl 5.10, it is now possible to lexically scope the context variable $_, either explicitly as my $_; or in a given / when construct.
Has anyone found good uses of the lexical $_? Does it make any constructs simpler / safer / faster?
What about situations that it makes more complicated? Has the lexical $_ introduced any bugs into your code? (since control structures that write to $_ will use the lexical version if it is in scope, this can change the behavior of the code if it contains any subroutine calls (due to loss of dynamic scope))
In the end, I'd like to construct a list that clarifies when to use $_ as a lexical, as a global, or when it doesn't matter at all.
NB: as of perl5-5.24 these experimental features are no longer part of perl.
IMO, one great thing to come out of lexical $_ is the new _ prototype symbol.
This allows you to specify a subroutine so that it will take one scalar or if none is provided it will grab $_.
So instead of writing:
sub foo {
my $arg = #_ ? shift : $_;
# Do stuff with $_
}
I can write:
sub foo(_) {
my $arg = shift;
# Do stuff with $_ or first arg.
}
Not a big change, but it's just that much simpler when I want that behavior. Boilerplate removal is a good thing.
Of course, this has the knock on effect of changing the prototypes of several builtins (eg chr), which may break some code.
Overall, I welcome lexical $_. It gives me a tool I can use to limit accidental data munging and bizarre interactions between functions. If I decide to use $_ in the body of a function, by lexicalizing it, I can be sure that whatever code I call, $_ won't be modified in calling code.
Dynamic scope is interesting, but for the most part I want lexical scoping. Add to this the complications around $_. I've heard dire warnings about the inadvisability of simply doing local $_;--that it is best to use for ( $foo ) { } instead. Lexicalized $_ gives me what I want 99 times out of 100 when I have localized $_ by whatever means. Lexical $_ makes a great convenience and readability feature more robust.
The bulk of my work has had to work with perl 5.8, so I haven't had the joy of playing with lexical $_ in larger projects. However, it feels like this will go a long way to make the use of $_ safer, which is a good thing.
I once found an issue (bug would be way too strong of a word) that came up when I was playing around with the Inline module. This simple script:
use strict qw(vars subs);
for ('function') {
$_->();
}
sub function {
require Inline;
Inline->bind(C => <<'__CODE__');
void foo()
{
}
__CODE__
}
fails with a Modification of a read-only value attempted at /usr/lib/perl5/site_perl/5.10/Inline/C.pm line 380. error message. Deep in the internals of the Inline module is a subroutine that wanted to modify $_, leading to the error message above.
Using
for my $_ ('function') { ...
or otherwise declaring my $_ is a viable workaround to this issue.
(The Inline module was patched to fix this particular issue).
[ Rationale: A short additional answer with a quick summary for perl newcomers that may be passing by. When searching for "perl lexical topic" one can end up here.]
By now (2015) I suppose it is common knowledge that the introduction of lexical topic (my $_ and some related features) led to some difficult to detect at the outset unintended behaviors and so was marked as experimental and then entered into a deprecation stage.
Partial summary of #RT119315:
One suggestion was for something like use feature 'lextopic'; to make use of a new
lexical topic variable:
$^_.
Another point made was that an "implicit name for the topicalizing operator ... other than $_" would work best when combined with explicitly lexical functions (e.g. lexical map or lmap). Whether these approaches would somehow make it possible to salvage given/when is not clear. In the afterlife of the experimental and depreciation phases perhaps something may end up living on in the river of CPAN.
Haven't had any problems here, although I tend to follow somewhat of a "Don't ask, don't tell" policy when it comes to Perls magic. I.e. the routines are not usually expected to rely on their peers screwing with non lexical data as a side effect, nor letting them.
I've tested code against various 5.8 and 5.10 versions of perl, while using a 5.6 describing Camel for occasional reference. Haven't had any problems. Most of my stuff was originally done for perl 5.8.8.

Origin of discouraged perl idioms: &x(...) and sub x($$) { ... }

In my perl code I've previously used the following two styles of writing which I've later found are being discouraged in modern perl:
# Style #1: Using & before calling a user-defined subroutine
&name_of_subroutine($something, $something_else);
# Style #2: Using ($$) to show the number of arguments in a user-defined sub
sub name_of_subroutine($$) {
# the body of a subroutine taking two arguments.
}
Since learning that those styles are not recommended I've simply stopped using them.
However, out of curiosity I'd like to know the following:
What is the origin of those two styles of writing? (I'm sure I've not dreamt up the styles myself.)
Why are those two styles of writing discouraged in modern perl?
Have the styles been considered best practice at some point in time?
The & sigil is not commonly used with function calls in modern Perl for two reasons. First, it is largely redundant since Perl will consider anything that looks like a function (followed by parens) a function. Secondly, there is a major difference between the way &function() and &function are executed, which may be confusing to less experienced Perl programmers. In the first case, the function is called with no arguments. In the second case, the function is called with the current #_ (and it can even make changes to the argument list which will be seen by later statements in that scope:
sub print_and_remove_first_arg {print 'first arg: ', shift, "\n"}
sub test {
&print_and_remove_first_arg;
print "remaining args: #_\n";
}
test 1, 2, 3;
prints
first arg: 1
remaining args: 2 3
So ultimately, using & for every function call ends up hiding the few &function; calls which can lead to hard to find bugs. In addition, using the & sigil prevents the honoring of function prototypes, which can be useful in some cases (if you know what you are doing), but also may lead to hard to track down bugs. Ultimately, & is a powerful modifier to function behavior, and should only be used when that behavior is desired.
Prototypes are similar, and their use should be limited in modern Perl. What must be stated explicitly is that prototypes in Perl are NOT function signatures. They are hints to the compiler that tell it to parse calls to those functions in a similar way as the built in functions. That is, each of the symbols in the prototype tells the compiler to impose that type of context on the argument. This functionality can be very helpful when defining functions that behave like map or push or keys which all treat their first argument differently than a standard list operator would.
sub my_map (&#) {...} # first arg is either a block or explicit code reference
my #ret = my_map {some_function($_)} 1 .. 10;
The reason sub ($$) {...} and similar uses of prototypes are discouraged is because 9 times out of 10 the author means "I want two args" and not "I want two args each with scalar context imposed on the call site". The former assertion is better written:
use Carp;
sub needs2 {
#_ == 2 or croak 'needs2 takes 2 arguments';
...
}
which would then allow the following calling style to work as expected:
my #array = (2, 4);
needs2 #array;
To sum up, both the & sigil and function prototypes are useful and powerful tools, but they should only be used when that functionality is required. Their superfluous use (or misuse as argument validation) leads to unintended behavior and difficult to track down bugs.
The & in function-calls was mandatory in Perl 4, so maybe you have picked that up from Programming perl (1991) by Larry Wall and Randal L. Schwartz, as I did, or somewhere similar.
As for the function prototypes, my guess is less qualified. Maybe you have been mimicking languages where it makes sense and/or is mandatory to declare argument lists, and since function prototypes in Perl look a little like argument lists, you've started adding them?
&function is discouraged because it makes the code less readable and isn't necessary (the cases that &function is necessary are rare and often better avoided).
Function prototypes aren't argument lists, so most of the time they'll just confuse your reader or lull you into a false sense of rigidity, so no need to use those unless you know exactly why you are.
& was mandatory in Perl 4, so they have been best/necessary practise. I don't think function prototypes ever has been.
For style #1, the & before the subroutine is only necessary if you have a subroutine that shares a name with a builtin and you need to disambiguate which one you wish to call, so that the interpreter knows what's going on. Otherwise, it's equivalent to calling the subroutine without &.
Since that's the case, I'd say its use is discouraged since you shouldn't be naming your subroutines with the same names as builtins, and it's good practice to define all your subroutines before you call them, for the sake of reading comprehension. In addition to this, if you define your subroutines before you call them, you can omit the parentheses, like in a builtin. Plus, just speaking visually, sticking & in front of every subroutine unnecessarily clutters up the file.
As for function prototypes, they were stuck into Perl after the fact and don't really do what they were made to do. From an article on perl.com:
For the most part, prototypes are more trouble than they're worth. For one thing, Perl doesn't check prototypes for methods because that would require the ability to determine, at compile time, which class will handle the method. Because you can alter #ISA at runtime--you see the problem. The main reason, however, is that prototypes aren't very smart. If you specify sub foo ($$$), you cannot pass it an array of three scalars (this is the problem with vec()). Instead, you have to say foo( $x[0], $x[1], $x[2] ), and that's just a pain.
In the end, it's better to comment your code to indicate what you intend for a subroutine to accept and do parameter checking yourself. As the article states, this is actually necessary for class methods, since no parameter checking occurs for them.
For what it's worth, Perl 6 adds formal parameter lists to the language like this:
sub do_something(Str $thing, Int $other) {
...
}

How can I make a static analysis call graph for Perl?

I am working on a moderately complex Perl program. As a part of its development, it has to go through modifications and testing. Due to certain environment constraints, running this program frequently is not an option that is easy to exercise.
What I want is a static call-graph generator for Perl. It doesn't have to cover every edge case(e,g., redefining variables to be functions or vice versa in an eval).
(Yes, I know there is a run-time call-graph generating facility with Devel::DprofPP, but run-time is not guaranteed to call every function. I need to be able to look at each function.)
Can't be done in the general case:
my $obj = Obj->new;
my $method = some_external_source();
$obj->$method();
However, it should be fairly easy to get a large number of the cases (run this program against itself):
#!/usr/bin/perl
use strict;
use warnings;
sub foo {
bar();
baz(quux());
}
sub bar {
baz();
}
sub baz {
print "foo\n";
}
sub quux {
return 5;
}
my %calls;
while (<>) {
next unless my ($name) = /^sub (\S+)/;
while (<>) {
last if /^}/;
next unless my #funcs = /(\w+)\(/g;
push #{$calls{$name}}, #funcs;
}
}
use Data::Dumper;
print Dumper \%calls;
Note, this misses
calls to functions that don't use parentheses (e.g. print "foo\n";)
calls to functions that are dereferenced (e.g. $coderef->())
calls to methods that are strings (e.g. $obj->$method())
calls the putt the open parenthesis on a different line
other things I haven't thought of
It incorrectly catches
commented functions (e.g. #foo())
some strings (e.g. "foo()")
other things I haven't thought of
If you want a better solution than that worthless hack, it is time to start looking into PPI, but even it will have problems with things like $obj->$method().
Just because I was bored, here is a version that uses PPI. It only finds function calls (not method calls). It also makes no attempt to keep the names of the subroutines unique (i.e. if you call the same subroutine more than once it will show up more than once).
#!/usr/bin/perl
use strict;
use warnings;
use PPI;
use Data::Dumper;
use Scalar::Util qw/blessed/;
sub is {
my ($obj, $class) = #_;
return blessed $obj and $obj->isa($class);
}
my $program = PPI::Document->new(shift);
my $subs = $program->find(
sub { $_[1]->isa('PPI::Statement::Sub') and $_[1]->name }
);
die "no subroutines declared?" unless $subs;
for my $sub (#$subs) {
print $sub->name, "\n";
next unless my $function_calls = $sub->find(
sub {
$_[1]->isa('PPI::Statement') and
$_[1]->child(0)->isa("PPI::Token::Word") and
not (
$_[1]->isa("PPI::Statement::Scheduled") or
$_[1]->isa("PPI::Statement::Package") or
$_[1]->isa("PPI::Statement::Include") or
$_[1]->isa("PPI::Statement::Sub") or
$_[1]->isa("PPI::Statement::Variable") or
$_[1]->isa("PPI::Statement::Compound") or
$_[1]->isa("PPI::Statement::Break") or
$_[1]->isa("PPI::Statement::Given") or
$_[1]->isa("PPI::Statement::When")
)
}
);
print map { "\t" . $_->child(0)->content . "\n" } #$function_calls;
}
I'm not sure it is 100% feasible (since Perl code can not be statically analyzed in theory, due to BEGIN blocks and such - see very recent SO discussion). In addition, subroutine references may make it very difficult to do even in places where BEGIN blocks don't come into play.
However, someone apparently made the attempt - I only know of it but never used it so buyer beware.
I don't think there is a "static" call-graph generator for Perl.
The next closest thing would be Devel::NYTProf.
The main goal is for profiling, but it's output can tell you how many times a subroutine has been called, and from where.
If you need to make sure every subroutine gets called, you could also use Devel::Cover, which checks to make sure your test-suite covers every subroutine.
I recently stumbled across a script while trying to solve find an answer to this same question. The script (linked to below) uses GraphViz to create a call graph of a Perl program or module. The output can be in a number of image formats.
http://www.teragridforum.org/mediawiki/index.php?title=Perl_Static_Source_Code_Analysis
I solved a similar problem recently, and would like to share my solution.
This tool was born out of desperation, untangling an undocumented part of a 30,000-line legacy script, in order to implement an urgent bug fix.
It reads the source code(s), uses GraphViz to generate a png, and then displays the image on-screen.
Since it uses simple line-by-line regexes, the formatting must be "sane" so that nesting can be determined.
If the target code is badly formatted, run it through a linter first.
Also, don't expect miracles such as parsing dynamic function calls.
The silver lining of a simple regex engine is that it can be easily extended for other languages.
The tool now also supports awk, bash, basic, dart, fortran, go, lua, javascript, kotlin, matlab, pascal, perl, php, python, r, raku, ruby, rust, scala, swift, and tcl.
https://github.com/koknat/callGraph