Strict vs non-strict: Which instance of a named variable gets used? - perl

I am trying to find the problem and propose a solution for the following Perl code.
A file without strict nor warnings on it has a function that uses a $variable without declaring it. So that variable is global to the file and the changes for that variable in this specific function are used outside of it (since it is global for the file).
Due to a recent update, this old_file now requires a modified version of itself (new_file) in which the same function is defined. But this new version has strict and warnings, so the same variable is defined, but this time as 'my' in the new function, and is returned in the end.
The tricky thing is that the code in the old_file did not change so it still expects the variable to be changed as its own global variable.
Since I don't know Perl well enough to be able to determine which version of this function is used (and since I can't test it, due to IT restrictions) I need an explanation of the behavior, possibly a link to a good paper about that topic.
Code: (I think the problem is in the variable LISTEREPONSE from the function start_handler.)
old_file:
use XML::Parser;
my $parser = new XML::Parser( ErrorContext => 2 );
$parser->setHandlers(
Start => \&start_handler,
End => \&end_handler,
Char => \&char_handler
);
$parser->parse(<$remote>);
close $remote;
...
sub start_handler {
my $expat = shift;
my $element = shift;
print;
while (#_) {
my $att = shift;
my $val = shift;
$LISTEREPONSE .= "$att=$val&";
}
}
new_file:
sub start_handler {
my $expat = shift;
my $element = shift;
print;
my $LISTEREPONSE;
while (#_) {
my $att = shift;
my $val = shift;
$LISTEREPONSE .= "$att=$val&";
}
return $LISTEREPONSE;
}

In strict mode, if you need $LISTEREPONSE become a global variable in package(file) scope.
Just declare (my $LISTEREPONSE;) in the beginning of file (after use).
In second case, $LISTEREPONSE is declare in sub, it's lexical scope and only available in sub.
my $LISTEREPONSE;
# ...
sub some_sub {
$LISTEREPONSE .= $some_stuff;
}

Related

can a variable be defined in perl whose value cannot be changed in a subroutine?

In the following script, I declare and modify #basearray in the main program. Inside the dosomething subroutine, I access #basearray, assign it to an array local to the script, and modify the local copy. Because I have been careful to change the value only of local variables inside the subroutine, #basearray is not changed.
If I had made the mistake of assigning a value to #basearray inside the subroutine, though, it would have been changed and that value would have persisted after the call to the subroutine.
This is demonstrated in the 2nd subroutine, doagain.
Also, doagain receives the reference \#basearray as an argument rather than accessing #basearray directly. But going to that additional trouble provides no additional safety. Why do it that way at all?
Is there a way to guarantee that I cannot inadvertently change #basearray inside any subroutine? Any kind of hard safety device that I can build into my code, analogous to use strict;, some combination perhaps of my and local?
Am I correct in thinking that the answer is No, and that the only solution is to not make careless programmer errors?
#!/usr/bin/perl
use strict; use warnings;
my #basearray = qw / amoeba /;
my $count;
{
print "\#basearray==\n";
$count = 0;
foreach my $el (#basearray) { $count++; print "$count:\t$el\n" };
}
sub dosomething
{
my $sb_name = (caller(0))[3];
print "entered $sb_name\n";
my #sb_array=( #basearray , 'dog' );
{
print "\#sb_array==\n";
$count = 0;
foreach my $el (#sb_array) { $count++; print "$count:\t$el\n" };
}
print "return from $sb_name\n";
}
dosomething();
#basearray = ( #basearray, 'rats' );
{
print "\#basearray==\n";
$count = 0;
foreach my $el (#basearray) { $count++; print "$count:\t$el\n" };
}
sub doagain
{
my $sb_name = (caller(0))[3];
print "entered $sb_name\n";
my $sf_array=$_[0];
my #sb_array=#$sf_array;
#sb_array=( #sb_array, "piglets ... influenza" );
{
print "\#sb_array==\n";
$count = 0;
foreach my $el (#sb_array) { $count++; print "$count:\t$el\n" };
}
print "now we demonstrate that passing an array as an argument to a subroutine does not protect it from being globally changed by programmer error\n";
#basearray = ( #sb_array );
print "return from $sb_name\n";
}
doagain( \#basearray );
{
print "\#basearray==\n";
$count = 0;
foreach my $el (#basearray) { $count++; print "$count:\t$el\n" };
}
There isn't a pragma or a keyword or such, but there are well established "good practices," which in this case completely resolve what you reasonably ponder about.
The first sub, dosomething, commits the sin of using variables visible in its scope but defined in the higher scope. Instead, always pass needed data to a subroutine (exceptions are rare, in crystal clear cases).
Directly using data from "outside" defies the idea of a function as an encapsulated procedure, exchanging data with its users via a well defined and clear interface. It entangles ("couples") sections of code that are in principle completely unrelated. In practice, it can also be outright dangerous.
Also, the fact the #basearray is up for grabs in the sub is best considered an accident -- what when that sub gets moved to a module? Or another sub is introduced to consolidate code where #basearray is defined?
The second sub, doagain, nicely takes a reference to that array. Then, to protect the data in the caller, one can copy the caller's array to another one which is local to the sub
sub doagain {
my ($ref_basearray) = #_;
my #local_ba = #$ref_basearray;
# work with local_ba and the caller's basearray is safe
}
The names of local lexical variables are of course arbitrary, but a convention where they resemble the caller's data names may be useful.
Then you can adopt a general practice, for safety, to always copy input variables to local ones. Work directly with references that are passed in only when you want to change the caller's data (relatively rare in Perl). This may hurt efficiency if it's done a lot with sizeable data, or when really large data structures are involved. So perhaps then make an exception and change data via its reference, and be extra careful.
(Putting my comment as answer)
One way to guarantee not changing a variable inside a subroutine is to not change it. Use only lexically scoped variables inside the subroutine, and pass whatever values you need inside the subroutine as arguments to the subroutine. It is a common enough coding practice, encapsulation.
One idea that you can use -- mainly as practice, I would say -- to force yourself to use encapsulation, is to put a block around your "main" code, and place subroutines outside of it. That way, if you should accidentally refer to a (formerly) global variable, use strict will be able to do it's job and produce a fatal error. Before runtime.
use strict;
use warnings;
main: { # lexical scope reduced to this block
my #basearray = qw / amoeba /;
print foo(#basearray); # works
print bar(); # fatal error
} # END OF MAIN lexical scope of #basearray ends here
sub foo {
my #basearray = #_; # encapsulated
return $basearray[1]++;
}
sub bar {
return $basearray[1]++; # out of scope ERROR
}
This will not compile, and will produce the error:
Global symbol "#basearray" requires explicit package name at foo.pl line 15.
Execution of foo.pl aborted due to compilation errors.
I would consider this a training device to force yourself to using good coding practices, and not something to necessarily use in production code.
There are several solutions with various levels of pithiness from "just don't change it" to "use an object or tied array and lock down the update functions". An intermediate solution, not unlike using an object with a getter method, is to define a function that returns your array but can only operate as an rvalue, and to use that function inside subroutines.
my #basearray = (...);
sub basearray { return #basearray }
sub foo {
foreach my $elem (basearray()) {
...
}
#bar = map { $_ *= 2 } basearray(); # ok
#bar = map { $_ *= 2 } #basearray; # modifies #basearray!
}
TLDR: yes, but.
I'll start with the "but". But it's better to design your code so that the variable simply doesn't exist in the scope where the untrusted function is defined.
sub untrusted_function {
...
}
my #basearray = qw( ... ); # declared after untrusted_function
If untrusted_function needs to be able to access the contents of the array, pass it a copy of the array as a parameter, so it can't modify the original.
Now here's the "yes".
You can mark the array as read-only before calling the untrusted function.
Internals::SvREADONLY($_, 1) for #basearray;
Internals::SvREADONLY(#basearray, 1);
Then mark it read-write again after the function has finished.
Internals::SvREADONLY(#basearray, 0);
Internals::SvREADONLY($_, 0) for #basearray;
Using Internals::SvREADONLY(#basearray, $bool) modifies the read-only state of the array itself, preventing elements from being added or removed from it; Internals::SvREADONLY($_, $bool) for #basearray modifies the read-only state of each element in the array too, which you probably want.
Of course, if your array contains references like blessed objects, you then need to consider whether you need to recurse into the references, marking them read-only too. (But can also be a concern with the shallow copy of the array I mentioned in the preferred solution!)
So yes, it is possible to prevent a sub from accidentally modifying a variable by marking that variable read-only before calling the sub, but it's a better idea to restructure your code so the sub simply doesn't have access to the variable at all.
Yes, but.
Here is a prototype that uses #TLP's answer.
#!/usr/bin/perl
use strict; use warnings;
{ # block_main BEG
my #basearray = qw / amoeba elephants sequoia /;
print join ( ' ', 'in main, #basearray==', join ( ' ', #basearray ), "\n" );
print "Now we call subroutine to print it:\n"; enumerateprintarray ( \#basearray );
my $ref_basearray = changearray ( \#basearray, 'wolves or coyotes . . . ' );
#basearray = #$ref_basearray;
print "Now we call subroutine to print it:\n"; enumerateprintarray ( \#basearray );
} # block_main END
sub enumerateprintarray
{
my $sb_name = (caller(0))[3];
#print join ( '' , #basearray ); # mortal sin! for in the day that thou eatest thereof thou shalt surely die.
my $sb_exact_count_arg = 1;
die "$sb_name must have exactly $sb_exact_count_arg arguments" unless ( ( scalar #_ ) == $sb_exact_count_arg );
my $sf_array = $_[0];
my #sb_array = #$sf_array;
my $sb_count = 0;
foreach (#sb_array)
{
$sb_count++;
print "\t$sb_count:\t$_\n";
}
}
sub changearray
{
my $sb_name = (caller(0))[3];
#print join ( '' , #basearray ); # in the day that thou eatest thereof thou shalt surely die.
my $sb_exact_count_arg = 2;
die "$sb_name must have exactly $sb_exact_count_arg arguments" unless ( ( scalar #_ ) == $sb_exact_count_arg );
my ( $sf_array, $addstring ) = #_;
my #sb_array = #$sf_array;
push #sb_array, $addstring;
return ( \#sb_array );
}

How can I ensure my method calls use the right method name on the right object?

I am working on a program which makes multiple attempts at processing, storing to a new log each time it tries (several other steps before/after).
use strict;
for (my $i = 0; $i < 3; $i++)
{
my $loggerObject = new MyLoggerObject(tag => $i);
#.. do a bunch of other things ..
Process($loggerObject,$i);
#.. do a bunch of other things ..
}
sub Process
{
my ($logger,$thingToLog) = #_;
sub Logger { $logger->Print($_[0]); }
Logger("Processing $thingToLog");
}
package MyLoggerObject;
sub new
{
my $package = shift;
my %hash = (#_); my $self = \%hash;
return bless $self, $package;
}
sub Print
{
my $self = shift;
my $value = shift;
print "Entering into log ".$self->{tag}.": $value\n";
}
1;
To avoid having to do a bunch of $self->{logger}->Print() and risk misspelling Print, I tried to collapse them into the local subroutine as seen above. However, when I run this I get:
perl PerlLocalMethod.pl
Entering into log 0: Processing 0
Entering into log 0: Processing 1
Entering into log 0: Processing 2
instead of:
perl PerlLocalMethod.pl
Entering into log 0: Processing 0
Entering into log 1: Processing 1
Entering into log 1: Processing 2
I am presuming the problem is that the Logger method is 'compiled' the first time I call the Process method with the object reference I used on the first call but not afterwards.
If I did $logger->Print(), misspelling Print, and hit a codepath I can't reliably test (this is for an embedded system and I can't force every error condition) it would error out the script with an undefined Method. I suppose I could use AUTOLOAD within logger and log any bad Method calls, but I'd like to know any other recommendations on how to make sure my Logger() calls are reliable and using the correct object.
In Perl, subroutines are compiled during compile time. Embedding a named subroutine declaration into a subroutine doesn't do what one would expect and isn't recommended.
If you are afraid of typos, write tests. See Test::More on how to do it. Use mocking if you can't instantiate system specific classes on a dev machine. Or use shorter names, like P.
You can declare the Logger in the highest scope as a closure over $logger that you would need to declare there, too:
my $logger;
sub Logger { $logger->Print($_[0]) }
But it's confusing and can lead to code harder to maintain if there are many variables and subroutines like that.
If you had used use warnings in your code you would have seen the message:
Variable "$logger" will not stay shared at logger line 24.
Which would have alerted you to the problem (moral: always use strict and use warnings).
I'm not entirely sure why you need so many levels of subroutines in order to do your logging, but it seems to me that all of your subroutines which take the $logger object as their first parameter should probably by methods on the MyLoggerObject (which should probably be called MyLoggerClass as it's a class, not an object).
If you do that, then you end up with this code (which seems to do what you want):
use strict;
use warnings;
for my $i (0 .. 2) {
my $loggerObject = MyLoggerClass->new(tag => $i);
#.. do a bunch of other things ..
$loggerObject->Process($i);
#.. do a bunch of other things ..
}
package MyLoggerClass;
sub new {
my $package = shift;
my $self = { #_ };
return bless $self, $package;
}
sub Process {
my $self = shift;
my ($thingToLog) = #_;
$self->Logger("Processing $thingToLog");
}
sub Logger {
my $self = shift;
$self->Print($_[0]);
}
sub Print {
my $self = shift;
my ($value) = #_;
print "Entering into log $self->{tag}: $value\n";
}
1;
Oh, and notice that I moved away from the indirect object notation call (new Class(...)) to the slightly safer Class->new(...). The style you used will work in the vast majority of cases, but when it doesn't you'll waste days trying to fix the problem.
As already explained above, using lexical defined variables in these kinds of method is not possible.
If you have to "duct-tape" this problem you could use global Variables (our instead of my).
sub Process
{
our ($logger,$thingToLog) = #_;
sub Logger { $logger->Print($_[0]); }
Logger("Processing $thingToLog");
}
But be aware that $logger and $thingToLog are now global variables accessible outside this function.

What happens if you call shift inside an anonymous sub?

First off, apologies if this question is ill-posed; I don't actually know a heck of a lot of perl.
I'm trying to debug some existing code that is supposed to send grades from our online homework system called WeBWorK to an LMS. I'm running into a weird error where I think something isn't getting initialized right, or perhaps isn't the right class. I suspect that the problem might be here:
sub go {
my $self = shift;
my $r = $self->r;
my $ce = $r->ce;
# If grades are begin passed back to the lti then we peroidically
# update all of the grades because things can get out of sync if
# instructors add or modify sets.
if ($ce->{LTIGradeMode}) {
my $grader = WeBWorK::Authen::LTIAdvanced::SubmitGrade->new($r);
my $post_connection_action = sub {
my $grader = shift;
# catch exceptions generated during the sending process
my $result_message = eval { $grader->mass_update() };
if ($#) {
# add the die message to the result message
$result_message .= "An error occurred while trying to update grades via LTI.\n"
. "The error message is:\n\n$#\n\n";
# and also write it to the apache log
$r->log->error("An error occurred while trying to update grades via LTI: $#\n");
}
};
if (MP2) {
$r->connection->pool->cleanup_register($post_connection_action, $grader);
} else {
$r->post_connection($post_connection_action, $grader);
}
}
... # a bunch of other stuff happens in the "go" sub
I kinda suspect that the issue is with the $grader variable; in particular, I don't know what my $grader = shift; does inside an anonymous sub. Like, if the sub had a name, it would be more clear that shift is giving the first argument passed to the sub. But since it's anonymous, I don't know what it thinks its arguments are.
Further, I'm not really sure why that line is needed at all. Like, from my googling, I'm given to understand that the point of an anonymous sub is to keep all the variables from the surrounding environment in scope. So why do we need to redefine $grader inside the anonymous sub in the first place?
Thanks for helping a perl noob out! :)
There's nothing special about anon subs in this regard.
my $cr = sub {
my $arg = shift;
say $arg;
};
$cr->("foo"); # Prints "foo"
$cr->("bar"); # Prints "bar"
In your case, you pass $post_connection_action and $grader to cleanup_register or post_connection with the expectation that it will result in a call to &$post_connection_action with $grader as its first argument. Whether the expectation is correct or not depends on the implementation of cleanup_register and post_connection, of which I know nothing.
Note that another solution presents itself here. Subs have access to the lexicals that were in scope when the sub operator was evaluated.
my $prefix = "> ";
my $cr = sub {
my $arg = shift;
say "$prefix$arg"; # Captures $prefix from sub{} scope.
};
$cr->("foo"); # Prints "> foo"
The above is true even if captured lexicals would otherwise no longer exist by the time the sub is called.
my $cr;
{
my $prefix = "> ";
$cr = sub {
my $arg = shift;
say "$prefix$arg"; # Captures $prefix from sub{} scope.
};
} # $prefix would normally stop existing here.
$cr->("foo"); # Prints "> foo"
That means you don't need to pass $grader as an argument. It can simply be captured. Just leave out my $grader = shift; (and don't pass $grader to
cleanup_register or post_connection).

In Perl, what is the most reliable way to determine a coderef's package?

I have a number of higher order utility functions that take in a code reference and apply that code to some data. Some of these functions require localizing variables during the execution of the subroutines. At the beginning, I was using caller to determine which package to localize into, in a similar manner as shown in this example reduce function:
sub reduce (&#) {
my $code = shift;
my $caller = caller;
my ($ca, $cb) = do {
no strict 'refs';
map \*{$caller.'::'.$_} => qw(a b)
};
local (*a, *b) = local (*$ca, *$cb);
$a = shift;
while (#_) {
$b = shift;
$a = $code->()
}
$a
}
Initially this technique worked fine, however as soon as I tried writing a wrapper function around the higher order function, figuring out the correct caller becomes complicated.
sub reduce_ref (&$) {&reduce($_[0], #{$_[1]})}
Now in order for reduce to work, I would need something like:
my ($ca, $cb) = do {
my $caller = 0;
$caller++ while caller($caller) =~ /^This::Package/;
no strict 'refs';
map \*{caller($caller).'::'.$_} => qw(a b)
};
At this point it became a question of which packages to skip, combined with the discipline of never using the function from within those packages. There had to be a better way.
It turns out that the subroutine the higher order functions take as an argument contains enough meta-data to solve the problem. My current solution is using the B introspection module to determine the compiling stash of the passed in subroutine. That way, no-matter what happens between compilation of the code and its execution, the higher order function always knows the correct package to localize into.
my ($ca, $cb) = do {
require B;
my $caller = B::svref_2object($code)->STASH->NAME;
no strict 'refs';
map \*{$caller.'::'.$_} => qw(a b)
};
So my ultimate question is if this is the best way of determining the caller's package in this situation? Is there some other way that I have not thought of? Is there some bug waiting to happen with my current solution?
First, you can use the following and not need any changes:
sub reduce_ref (&$) { #_ = ( $_[0], #{$_[1]} ); goto &reduce; }
But generally speaking, the following is indeed exactly what you want:
B::svref_2object($code)->STASH->NAME
You want the $a and $b variables of the sub's __PACKAGE__, so you want to know the sub's __PACKAGE__, and that's exactly what that returns. It even fixes the following:
{
package Utils;
sub mk_some_reducer {
...
return sub { ... $a ... $b ... };
}
}
reduce(mk_some_reducer(...), ...)
It doesn't fix everything, but that's impossible without using arguments instead of $a and $b.
In case anyone needs them, here are the functions that I eventually decided to use:
require B;
use Scalar::Util 'reftype';
use Carp 'croak';
my $cv_caller = sub {
reftype($_[0]) eq 'CODE' or croak "not code: $_[0]";
B::svref_2object($_[0])->STASH->NAME
};
my $cv_local = sub {
my $caller = shift->$cv_caller;
no strict 'refs';
my #ret = map \*{$caller.'::'.$_} => #_;
wantarray ? #ret : pop #ret
};
Which would be used as:
my ($ca, $cb) = $code->$cv_local(qw(a b));
in the context of the original question.

How can I code in a functional style in Perl?

How do you either:
have a sub return a sub
or
execute text as code
in Perl?
Also, how do I have an anonymous function store state?
A sub returns a sub as a coderef:
# example 1: return a sub that is defined inline.
sub foo
{
return sub {
my $this = shift;
my #other_params = #_;
do_stuff();
return $some_value;
};
}
# example 2: return a sub that is defined elsewhere.
sub bar
{
return \&foo;
}
Arbitrary text can be executed with the eval function: see the documentation at perldoc -f eval:
eval q{print "hello world!\n"};
Note that this is very dangerous if you are evaluating anything extracted from user input, and is generally a poor practice anyway as you can generally define your code in a coderef as in the earlier examples above.
You can store state with a state variable (new in perl5.10), or with a variable scoped higher than the sub itself, as a closure:
use feature 'state';
sub baz
{
state $x;
return ++$x;
}
# create a new scope so that $y is not visible to other functions in this package
{
my $y;
sub quux
{
return ++$y;
}
}
Return a subroutine reference.
Here's a simple example that creates sub refs closed over a value:
my $add_5_to = add_x_to(5);
print $add_5_to->(7), "\n";
sub add_x_to {
my $x = shift;
return sub { my $value = shift; return $x + $value; };
}
You can also work with named subs like this:
sub op {
my $name = shift;
return $op eq 'add' ? \&add : sub {};
}
sub add {
my $l = shift;
my $r = shift;
return $l + $r;
}
You can use eval with an arbitrary string, but don't do it. The code is hard to read and it restarts compilation, which slows everything down. There are a small number of cases where string eval is the best tool for the job. Any time string eval seems like a good idea, you are almost certainly better off with another approach.
Almost anything you would like to do with string eval can be achieved with closures.
Returning subs is easy by using the sub keyword. The returned sub closes over the lexical variables it uses:
#!/usr/bin/perl
use strict; use warnings;
sub mk_count_from_to {
my ($from, $to) = #_;
return sub {
return if $from > $to;
return $from ++;
};
}
my $c = mk_count_from_to(-5, 5);
while ( defined( my $n = $c->() ) ) {
print "$n\n";
}
5.10 introduced state variables.
Executing text as Perl is accomplished using eval EXPR:
the return value of EXPR is parsed and executed as if it were a little Perl program. The value of the expression (which is itself determined within scalar context) is first parsed, and if there weren't any errors, executed in the lexical context of the current Perl program, so that any variable settings or subroutine and format definitions remain afterwards. Note that the value is parsed every time the eval executes
Executing arbitrary strings will open up huge gaping security holes.
You can create anonymous subroutines and access them via a reference; this reference can of course be assigned to a scalar:
my $subref = sub { ... code ... }
or returned from another subroutine
return sub { ... code ... }
If you need to store states, you can create closures with lexical variables defined in an outer scope like:
sub create_func {
my $state;
return sub { ... code that can refer to $state ... }
}
You can run code with eval