Perl function/sub best practice - perl

I have a really quick question. I have a program with a lot of functions that are run from main. Is it best practice to have the functions first and then the call from main, or the other way around?
For example:
sub myFunction {
#Do something
}
my $stuff = myFunction();
Or:
my $stuff = myFunction();
sub myFunction {
#Do something
}
Sorry for any ignorance, I do not have any formal training and I have seen it done both ways online. Thanks

I recommend placing your code at the bottom.
Issue 1
The latter snippet poor because myFunction is in scope of $stuff, but it shouldn't be. That's easy to fix though.
{
my $stuff = myFunction();
}
sub myFunction {
#Do something
}
Ok, so that's not a big issue since I place all top-level code in a block, even if it comes at the end. It looks cleaner to me that way, and it makes it easier to transform into a sub from which I can return.
sub myFunction {
#Do something
}
sub main {
return 0 if is_nothing_to_do();
my $stuff = myFunction();
...
return 0;
}
exit(main(parse_args));
Issue 2
Many languages require that you declare your subs before you call them. That's rarely needed in Perl, though there are a couple of scenarios where it is required. Subs with prototypes is one of those. If you wanted to place your code at the top, you would need to add declarations even before that.
sub myFunction(&#);
{
my $stuff = myFunction { ... } ...;
}
sub myFunction(&#) {
#Do something
}
You probably never have to do that since all but some rare uses of prototypes is discouraged, and the other scenarios are even rarer.
Issue 3
You might accidentally skip initialization code by placing your top-level code before your subroutines.
Compare:
print my_counter(), "\n"; # Warns, then prints a blank line
...
{
my $counter = 1;
sub my_counter {
return $counter++;
}
}
...
and
...
{
my $counter = 1;
sub my_counter {
return $counter++;
}
}
...
print my_counter(), "\n"; # Prints 1
Issue 4
Many languages require that you declare your subs before you call them, so more people will be more familiar with having the top-level code at the bottom.

It doesn't matter, so long as you're able to find the code that you need to find. I typically like to set up my code like this:
use strict;
use warnings;
exit main();
sub main {
do_this();
dont_do_that();
cant_you_read_the_signs();
return 0;
}
sub do_this {
....
}
...
Putting your main code in an actual function or block called "main" helps keep you from polluting the program with globals.

Related

Preserve a variable's value across multiple subroutine calls in perl

Just wanted to know what was the best way to reserve the value of a variable across multiple calls to the same subroutine . i.e
$someList = &someRoutine(100, 200);
$someList2 = &someRoutine(100, 200);
sub someRoutine {
$someAddition = 0;
foreach $someSum (#_){
$someAddition += $someSum;
}
return $someAddition
}
print $someList;
print $someList2;
Basically, someList should print 300 and someList2 should print 600. How do i make it so that someList2 prints 600? i want $someAddition to be preserved across multiple subroutine calls.
There are several ways to do it. I'll demonstrate two of them:
First, in modern versions of Perl you can use state:
use feature qw/state/;
print someRoutine(100,200), "\n";
print someRoutine(100,200), "\n";
sub someRoutine {
state $someAddition = 0;
foreach my $someSum ( #_ ) {
$someAddition += $someSum;
}
return $someAddition;
}
In this version, the $someAddition variable will be initialized to zero once, and only once. From that point on, the value will be retained between calls.
Another version is using a lexical closure. Here's an example:
my $summer = makeSummer();
print $summer->(100,200), "\n";
print $summer->(100,200), "\n";
sub makeSummer {
my $someAddition = 0;
return sub {
$someAddition += $_ foreach #_;
return $someAddition;
}
}
The second version is a little more complex, but has two advantages. First, you can start a fresh summation simply by calling the makeSummer routine to manufacture a new closure. Second, it will work on any version of Perl 5, not just versions recent enough to have the newer state keyword.
If you are not concerned with initializing the stateful variable before the sub is declared, you can also do this:
my $someAddition;
sub someRoutine {
$someAddition = 0 unless defined $someAddition;
foreach my $someSum( #_ ) {
$someAddition += $someSum;
}
return $someAddition;
}
A fourth method makes use of package globals. I save this one for last because it's the most prone to abuse and mistakes. But here you go;
our $someAddition = 0;
someRoutine(100,200);
print "$someAddition\n";
someRoutine(100,200);
print "$someAddition\n";
sub someRoutine {
$someAddition += $_ foreach #_;
}
In this last version, $someAddition is a package global, and its global scope makes it available both inside and outside of any subroutines living within the same namespace.
I assume you're at least using a variant of Perl 5? It has been bad practice to use ampersands & on subroutine calls since the first version of Perl 5 twenty-two years ago.
It is also vital that you use strict and use warnings at the top of every Perl program, and declare your variables ay their first point of use with my. It is a measure that will uncover many simple coding errors that you can otherwise easily overlook.
Perl variable names should use only lower-case letters, digits, and underscores. Capital letters are reserved for global identifiers such as package names.
By far the simplest and most common way of creating a static variable is just to declare it outside the subroutine. Like this
use strict;
use warnings;
my $some_list = some_routine(100, 200);
my $some_list2 = some_routine(100, 200);
my $some_addition;
sub some_routine {
$some_addition += $_ for #_;
return $some_addition
}
print $some_list, "\n";
print $some_list2, "\n";
output
300
600
If you want to protect the variable from being accessed by any following code other than the subroutine, then just enclose them in braces, like this
{
my $some_addition;
sub some_routine {
$some_addition += $_ for #_;
return $some_addition
}
}
Take a look at Persistent Private Variables in "man perlsub".

Alternative to "last" in do loops

According to the perl manual for for last (http://perldoc.perl.org/functions/last.html), last can't be used to break out of do {} loops, but it doesn't mention an alternative. The script I'm maintaining has this structure:
do {
...
if (...)
{
...
last;
}
} while (...);
and I'm pretty sure he wants to go to the end of the loop, but its actually exiting the current subroutine, so I need to either change the last or refactor the whole loop if there is a better way that someone can recommend.
Wrap the do "loop" in a bare block (which is a loop):
{
do {
...
if (...)
{
...
last;
}
} while (...);
}
This works for last and redo, but not next; for that place the bare block inside the do block:
do {{
...
if (...)
{
...
next;
}
...
}} while (...);
do BLOCK while (EXPR) is funny in that do is not really a loop structure. So, last, next, and redo are not supposed to be used there. Get rid of the last and adjust the EXPR to evaluate false when that situation is found.
Also, turn on strict, which should give you at least a warning here.
Never a fan of do/while loops in Perl. the do isn't really a loop which is why last won't break out of it. In our old Pascal daze you couldn't exit a loop in the middle because that would be wrong according to the sage Niklaus "One entrance/one exit" Wirth. Therefore, we had to create an exit flag. In Perl it'd look something like this:
my $endFlag = 0;
do {
...
if (...)
{
...
$endFlag = 1;
}
} while ((...) and (not $endFlag));
Now, you can see while Pascal never caught on.
Why not just use a while loop?
while (...) {
...
if (...) {
last;
}
}
You might have to change your logic slightly to accommodate the fact that your test is at the beginning instead of end of your loop, but that should be trivial.
By the way, you actually CAN break out of a Pascal loop if you're using Delphi, and Delphi DID catch on for a little while until Microsoft wised up and came out with the .net languages.
# "http://perldoc.perl.org/functions/last.html":
last cannot be used to exit a block that returns a value such as eval {} , sub {} or do {} , and should not be used to exit a grep() or map() operation.
So, use a boolean in the 'while()' and set it where you have 'last'...
Late to the party - I've been messing with for(;;) recently. In my rudimentary testing, for conditional expressions A and B, what you want to do with:
do {
last if A;
} while(B);
can be accomplished as:
for(;; B || last) {
last if A;
}
A bit ugly, but perhaps not more so than the other workarounds :) . An example:
my $i=1;
for(;; $i<=3 || last) {
print "$i ";
++$i;
}
Outputs 1 2 3. And you can combine the increment if you want:
my $i=1;
for(;; ++$i, $i<=3 || last) {
print "$i ";
}
(using || because it has higher precedence than ,)

How do I run code only after Perl's Find::Find finishes?

Perl question for you:
#!/usr/bin/perl
use File::Find;
#Find files
find(\&wanted, $dir);
sub wanted { #Do something }
#Done going through all files, do below:
other stuff { }
So, I basically want to parse a directory and find certain kinds of files. I can do that successfully with File::Find. However, my next step is, once I'm done searching the files, I want to do my next process.
The problem is , whatever, I put after the sub wanted { #Do something } , gets executed, everytime, the file I want is not found! I know this is only logical for the program to do that. But, could you tell me what I'd need to do to accomplish this:
1] Find files : using > sub wanted { #Do something }
2] While no more files to search : >do something else { }
Thanks!
UPDATED ANSWER: (after OP clarified that he simply wants to run some code after find() finishes searching):
Since find() is not searching in parallel, it will simply return when all of the search is completed. Therefore you don't need to do ANYTHING special to achieve your goal:
find(\&wanted, #directories_to_search);
# Here be code that runs after search completes.
ORIGINAL ANSWER
You can set founding of files flag in wanted subroutine:
my $files_not_found = 1;
find(\&wanted, #directories_to_search);
sub wanted { #args=#_; $files_not_found = 0; }
if ($files_not_found) {
print "No files found!\n";
}
# Here you do things after find is finished.

How can I cleanly handle error checking in Perl?

I have a Perl routine that manages error checking. There are about 10 different checks and some are nested, based on prior success. These are typically not exceptional cases where I would need to croak/die. Also, once an error occurs, there's no point in running through the rest of the checks.
However, I can't seem to think of a neat way to solve this issue except by using something analogous to the following horrid hack:
sub lots_of_checks
{
if(failcond)
{
goto failstate:
}
elsif(failcond2)
{
goto failstate;
}
#This continues on and on until...
return 1; #O happy day!
failstate:
return 0; #Dead...
}
What I would prefer to be able to do would be something like so:
do
{
if(failcond)
{
last;
}
#...
};
An empty return statement is a better way of returning false from a Perl sub than returning 0. The latter value will actually be true in list context:
sub lots_of_checks {
return if fail_condition_1;
return if fail_condition_2;
# ...
return 1;
}
Perhaps you want to have a look at the following articles about exception handling in perl5:
perl.com: Object Oriented Exception Handling in Perl
perlfoundation.com: Exception Handling in Perl
You absolutely can do what you prefer.
Check: {
last Check
if failcond1;
last Check
if failcond2;
success();
}
Why would you not use exceptions? Any case where the normal flow of the code should not be followed is an exception. Using "return" or "goto" is really the same thing, just more "not what you want".
(What you really want are continuations, which "return", "goto", "last", and "throw" are all special cases of. While Perl does not have full continuations, we do have escape continuations; see http://metacpan.org/pod/Continuation::Escape)
In your code example, you write:
do
{
if(failcond)
{
last;
}
#...
};
This is probably the same as:
eval {
if(failcond){
die 'failcond';
}
}
If you want to be tricky and ignore other exceptions:
my $magic = [];
eval {
if(failcond){
die $magic;
}
}
if ($# != $magic) {
die; # rethrow
}
Or, you can use the Continuation::Escape module mentioned above. But
there is no reason to ignore exceptions; it is perfectly acceptable
to use them this way.
Given your example, I'd write it this way:
sub lots_of_checks {
local $_ = shift; # You can use 'my' here in 5.10+
return if /condition1/;
return if /condition2/;
# etc.
return 1;
}
Note the bare return instead of return 0. This is usually better because it respects context; the value will be undef in scalar context and () (the empty list) in list context.
If you want to hold to a single-exit point (which is slightly un-Perlish), you can do it without resorting to goto. As the documentation for last states:
... a block by itself is semantically identical to a loop that executes once.
Thus "last" can be used to effect an early exit out of such a block.
sub lots_of_checks {
local $_ = shift;
my $all_clear;
{
last if /condition1/;
last if /condition2/;
# ...
$all_clear = 1; # only set if all checks pass
}
return unless $all_clear;
return 1;
}
If you want to keep your single in/single out structure, you can modify the other suggestions slightly to get:
sub lots_of_checks
{
goto failstate if failcond1;
goto failstate if failcond2;
# This continues on and on until...
return 1; # O happy day!
failstate:
# Any clean up code here.
return; # Dead...
}
IMO, Perl's use of the statement modifier form "return if EXPR" makes guard clauses more readable than they are in C. When you first see the line, you know that you have a guard clause. This feature is often denigrated, but in this case I am quite fond of it.
Using the goto with the statement modifier retains the clarity, and reduces clutter, while it preserves your single exit code style. I've used this form when I had complex clean up to do after failing validation for a routine.

How can I detect recursing package calls in Perl?

I have a Perl project were I just had a problem by making a circular package call. The code below demonstrates the problem.
When this is executed, each package will call the other until all of the memory of the computer is consumed and it locks up. I agree that this is a bad design and that circular calls like this should not be made in the design, but my project is sufficiently big that I would like to detect this at run time.
I have read about the weaken function and Data::Structure::Util, but I have not figured out a way to detect if there is a circular package load (I am assume, because a new copy is being made at each iteration and stored in each copy of the $this hash). Any ideas?
use system::one;
my $one = new system::one();
package system::one;
use strict;
use system::two;
sub new {
my ($class) = #_;
my $this = {};
bless($this,$class);
# attributes
$this->{two} = new system::two();
return $this;
}
package system::two;
use strict;
use system::one;
sub new {
my ($class) = #_;
my $this = {};
bless($this,$class);
# attributes
$this->{one} = new system::one();
return $this;
}
Here, have some code too. :)
sub break_recursion(;$) {
my $allowed = #_ ? shift : 1;
my #caller = caller(1);
my $call = $caller[3];
my $count = 1;
for(my $ix = 2; #caller = caller($ix); $ix++) {
croak "found $count levels of recursion into $call"
if $caller[3] eq $call && ++$count > $allowed;
}
}
sub check_recursion(;$) {
my $allowed = #_ ? shift : 1;
my #caller = caller(1);
my $call = $caller[3];
my $count = 1;
for(my $ix = 2; #caller = caller($ix); $ix++) {
return 1
if $caller[3] eq $call && ++$count > $allowed;
}
return 0;
}
These are called like:
break_recursion(); # to die on any recursion
break_recursion(5); # to allow up to 5 levels of recursion
my $recursing = check_recursion(); # to check for any recursion
my $recursing = check_recursion(10); # to check to see if we have more than 10 levels of recursion.
Might CPAN these, I think. If anyone has any thoughts about that, please share.
The fact that these are in separate packages has nothing at all to do with the fact that this runs infinitely, consuming all available resources. You're calling two methods from within one another. This isn't circular reference, it's recursion, which is not the same thing. In particular, weaken won't help you at all. You'd get exactly the same effect from:
sub a {
b();
}
sub b {
a();
}
a();
The best way to avoid this is don't do that. More usefully, if you have to write recursive functions try not to use multiple functions in the recursion chain, but simply the one, so you have an easier time mentally keeping track of where your calls should terminate.
As to how to detect whether something like this is happening, you would have to do something simple like increment a variable with your recursion depth and terminate (or return) if your depth exceeds a certain value. But you really shouldn't have to rely on that, it's similar to writing a while loop and using an increment there to make sure your function doesn't run out of control. Just don't recurse over a set unless you know how and when it terminates.
Another relevant question would be what are you trying to accomplish in the first place?
I suggest making a routine called something like break_constructor_recursion() that uses caller() to examine the call stack like so:
Find out what method in what package just called me.
Look up the rest of the call stack seeing if that same method in that same package is anywhere further up.
If so, die() with something appropriate.
Then you add a call to break_constructor_recursion() in your constructors. If the constructor is being called from inside itself, it'll bomb out.
Now, this can throw false positives; it's not impossible for a constructor to be legitimately called inside itself. If you have issues with that, I'd say just have it look for some N additional occurrences of the constructor before it identifies an error. If there are 20 calls to system::two::new() on the stack, the chances that you aren't recursing are pretty low.
The classic break on double recursion is to use a state variable to determine if you are already inside a function:
{
my $in_a;
sub a {
return if $in_a; #do nothing if b(), or someone b() calls, calls a()
$in_a = 1;
b();
$in_a = 0;
}
}
You can do whatever you want if $in_a is true, but dieing or returning is common. If you are using Perl 5.10 or later you can use the state function instead of nesting the function in its own scope:
sub a {
state $in_a;
return if $in_a; #do nothing if b(), or someone b() calls, calls a()
$in_a = 1;
b();
$in_a = 0;
}
use warnings;
without warnings:
#!/usr/bin/perl
use strict;
sub foo {
foo();
}
foo();
-
$ perl script.pl
^C # after death
with warnings:
#!/usr/bin/perl
use strict;
use warnings;
sub foo {
foo();
}
foo();
-
$ perl script.pl
Deep recursion on subroutine "main::foo" at script.pl line 7.
^C # after death
Always always use warnings.
use warnings FATAL => qw( recursion );
#!/usr/bin/perl
use strict;
use warnings FATAL => qw( recursion );
sub foo {
foo();
}
foo();
-
$ perl script.pl
Deep recursion on subroutine "main::foo" at script.pl line 7.
$