I've had to edit some Perl scripts for my work. I had never written a single line of Perl but I could get my thing working, until I bumped into a weird issue. Consider the following function
sub trim {
my $s = shift;
$s =~ s/^\s+|\s+$//g;
return $s
}
with the two following ways of calling it :
my $previousVersionHash = readline( *CONFIG );
$previousVersionHash = trim($previousVersionHash);
$maxClients = readline( *CONFIG );
and
my $previousVersionHash = trim(readline( *CONFIG ));
$maxClients = readline( *CONFIG );
In the first case, it works fine but with the second snippet I get warnings and $maxClients is empty instead of containing the second line of the CONFIG file.
Why this strange behaviour ? Note that this also happened when I used sub trim($). By the way, I don't know the difference between those two declarations but this is documented so I could take a look if it matters for this question.
Perl will let the function it calls now if it wants an list or a scalar returned. You check this with wantarray. readline reads all the lines into an array when you call it in list context, like you do in your second example.
The second way is calling it like this, since (like #ThisSuitIsBlackNot's comment says) arguments to user-defined subroutines are always evaluated in list context (assuming no prototypes):
my #array = readline( *CONFIG );
trim(#array);
So here all the rest of the lines of *CONFIG is put into the array #array. Your trim function discards all other values than the first of them.
To solve this and force the call to trim to be in scalar context do like this:
my $previousVersionHash = trim(scalar(readline( *CONFIG )));
or continue to call it like you do.
To update trim to support arrays you could try something like:
sub trim {
my #a;
foreach my $s (#_) {
$s =~ s/^\s+|\s+$//g;
push #a, $s;
}
return #a;
}
Related
In the following script, I declare and modify #basearray in the main program. Inside the dosomething subroutine, I access #basearray, assign it to an array local to the script, and modify the local copy. Because I have been careful to change the value only of local variables inside the subroutine, #basearray is not changed.
If I had made the mistake of assigning a value to #basearray inside the subroutine, though, it would have been changed and that value would have persisted after the call to the subroutine.
This is demonstrated in the 2nd subroutine, doagain.
Also, doagain receives the reference \#basearray as an argument rather than accessing #basearray directly. But going to that additional trouble provides no additional safety. Why do it that way at all?
Is there a way to guarantee that I cannot inadvertently change #basearray inside any subroutine? Any kind of hard safety device that I can build into my code, analogous to use strict;, some combination perhaps of my and local?
Am I correct in thinking that the answer is No, and that the only solution is to not make careless programmer errors?
#!/usr/bin/perl
use strict; use warnings;
my #basearray = qw / amoeba /;
my $count;
{
print "\#basearray==\n";
$count = 0;
foreach my $el (#basearray) { $count++; print "$count:\t$el\n" };
}
sub dosomething
{
my $sb_name = (caller(0))[3];
print "entered $sb_name\n";
my #sb_array=( #basearray , 'dog' );
{
print "\#sb_array==\n";
$count = 0;
foreach my $el (#sb_array) { $count++; print "$count:\t$el\n" };
}
print "return from $sb_name\n";
}
dosomething();
#basearray = ( #basearray, 'rats' );
{
print "\#basearray==\n";
$count = 0;
foreach my $el (#basearray) { $count++; print "$count:\t$el\n" };
}
sub doagain
{
my $sb_name = (caller(0))[3];
print "entered $sb_name\n";
my $sf_array=$_[0];
my #sb_array=#$sf_array;
#sb_array=( #sb_array, "piglets ... influenza" );
{
print "\#sb_array==\n";
$count = 0;
foreach my $el (#sb_array) { $count++; print "$count:\t$el\n" };
}
print "now we demonstrate that passing an array as an argument to a subroutine does not protect it from being globally changed by programmer error\n";
#basearray = ( #sb_array );
print "return from $sb_name\n";
}
doagain( \#basearray );
{
print "\#basearray==\n";
$count = 0;
foreach my $el (#basearray) { $count++; print "$count:\t$el\n" };
}
There isn't a pragma or a keyword or such, but there are well established "good practices," which in this case completely resolve what you reasonably ponder about.
The first sub, dosomething, commits the sin of using variables visible in its scope but defined in the higher scope. Instead, always pass needed data to a subroutine (exceptions are rare, in crystal clear cases).
Directly using data from "outside" defies the idea of a function as an encapsulated procedure, exchanging data with its users via a well defined and clear interface. It entangles ("couples") sections of code that are in principle completely unrelated. In practice, it can also be outright dangerous.
Also, the fact the #basearray is up for grabs in the sub is best considered an accident -- what when that sub gets moved to a module? Or another sub is introduced to consolidate code where #basearray is defined?
The second sub, doagain, nicely takes a reference to that array. Then, to protect the data in the caller, one can copy the caller's array to another one which is local to the sub
sub doagain {
my ($ref_basearray) = #_;
my #local_ba = #$ref_basearray;
# work with local_ba and the caller's basearray is safe
}
The names of local lexical variables are of course arbitrary, but a convention where they resemble the caller's data names may be useful.
Then you can adopt a general practice, for safety, to always copy input variables to local ones. Work directly with references that are passed in only when you want to change the caller's data (relatively rare in Perl). This may hurt efficiency if it's done a lot with sizeable data, or when really large data structures are involved. So perhaps then make an exception and change data via its reference, and be extra careful.
(Putting my comment as answer)
One way to guarantee not changing a variable inside a subroutine is to not change it. Use only lexically scoped variables inside the subroutine, and pass whatever values you need inside the subroutine as arguments to the subroutine. It is a common enough coding practice, encapsulation.
One idea that you can use -- mainly as practice, I would say -- to force yourself to use encapsulation, is to put a block around your "main" code, and place subroutines outside of it. That way, if you should accidentally refer to a (formerly) global variable, use strict will be able to do it's job and produce a fatal error. Before runtime.
use strict;
use warnings;
main: { # lexical scope reduced to this block
my #basearray = qw / amoeba /;
print foo(#basearray); # works
print bar(); # fatal error
} # END OF MAIN lexical scope of #basearray ends here
sub foo {
my #basearray = #_; # encapsulated
return $basearray[1]++;
}
sub bar {
return $basearray[1]++; # out of scope ERROR
}
This will not compile, and will produce the error:
Global symbol "#basearray" requires explicit package name at foo.pl line 15.
Execution of foo.pl aborted due to compilation errors.
I would consider this a training device to force yourself to using good coding practices, and not something to necessarily use in production code.
There are several solutions with various levels of pithiness from "just don't change it" to "use an object or tied array and lock down the update functions". An intermediate solution, not unlike using an object with a getter method, is to define a function that returns your array but can only operate as an rvalue, and to use that function inside subroutines.
my #basearray = (...);
sub basearray { return #basearray }
sub foo {
foreach my $elem (basearray()) {
...
}
#bar = map { $_ *= 2 } basearray(); # ok
#bar = map { $_ *= 2 } #basearray; # modifies #basearray!
}
TLDR: yes, but.
I'll start with the "but". But it's better to design your code so that the variable simply doesn't exist in the scope where the untrusted function is defined.
sub untrusted_function {
...
}
my #basearray = qw( ... ); # declared after untrusted_function
If untrusted_function needs to be able to access the contents of the array, pass it a copy of the array as a parameter, so it can't modify the original.
Now here's the "yes".
You can mark the array as read-only before calling the untrusted function.
Internals::SvREADONLY($_, 1) for #basearray;
Internals::SvREADONLY(#basearray, 1);
Then mark it read-write again after the function has finished.
Internals::SvREADONLY(#basearray, 0);
Internals::SvREADONLY($_, 0) for #basearray;
Using Internals::SvREADONLY(#basearray, $bool) modifies the read-only state of the array itself, preventing elements from being added or removed from it; Internals::SvREADONLY($_, $bool) for #basearray modifies the read-only state of each element in the array too, which you probably want.
Of course, if your array contains references like blessed objects, you then need to consider whether you need to recurse into the references, marking them read-only too. (But can also be a concern with the shallow copy of the array I mentioned in the preferred solution!)
So yes, it is possible to prevent a sub from accidentally modifying a variable by marking that variable read-only before calling the sub, but it's a better idea to restructure your code so the sub simply doesn't have access to the variable at all.
Yes, but.
Here is a prototype that uses #TLP's answer.
#!/usr/bin/perl
use strict; use warnings;
{ # block_main BEG
my #basearray = qw / amoeba elephants sequoia /;
print join ( ' ', 'in main, #basearray==', join ( ' ', #basearray ), "\n" );
print "Now we call subroutine to print it:\n"; enumerateprintarray ( \#basearray );
my $ref_basearray = changearray ( \#basearray, 'wolves or coyotes . . . ' );
#basearray = #$ref_basearray;
print "Now we call subroutine to print it:\n"; enumerateprintarray ( \#basearray );
} # block_main END
sub enumerateprintarray
{
my $sb_name = (caller(0))[3];
#print join ( '' , #basearray ); # mortal sin! for in the day that thou eatest thereof thou shalt surely die.
my $sb_exact_count_arg = 1;
die "$sb_name must have exactly $sb_exact_count_arg arguments" unless ( ( scalar #_ ) == $sb_exact_count_arg );
my $sf_array = $_[0];
my #sb_array = #$sf_array;
my $sb_count = 0;
foreach (#sb_array)
{
$sb_count++;
print "\t$sb_count:\t$_\n";
}
}
sub changearray
{
my $sb_name = (caller(0))[3];
#print join ( '' , #basearray ); # in the day that thou eatest thereof thou shalt surely die.
my $sb_exact_count_arg = 2;
die "$sb_name must have exactly $sb_exact_count_arg arguments" unless ( ( scalar #_ ) == $sb_exact_count_arg );
my ( $sf_array, $addstring ) = #_;
my #sb_array = #$sf_array;
push #sb_array, $addstring;
return ( \#sb_array );
}
I'm trying to mirror the website which having the files and folder to hash.
This one having example So I tried, the following
my $url = "http://localhost/mainfolder/";
my ($parent) = $url=~m/\/(\w+)\/?$/;
my %tree=(mainfolder=>[]);
folder_create($url);
sub folder_create
{
my $url = shift;
my $cont = get($url);
my ($child) = $url=~m/($parent.*)/;
$child=~s/\/?(\w+)\/?/{$1}/g;
while($cont=~m/(<tr.+?<\/tr>)/g)
{
my $line = $1;
if($line=~m/\[DIR\].*?href="([^"]*)"[^>]*>(.+?)<\/a>/)
{
my $sub =$1;
$sub=~s/\///;
print "$child\n\n";
push ( eval'#{$tree $child}',$sub);
}
}
}
use Data::Dumper;
print Dumper \%tree,"\n\n\n";
Update
Instead of messing with eval you should use the Data::Diver module
Because of the single quotes, you're trying to execute #{$hash$var} which isn't valid Perl.
If you wrote it as
push eval "\#{\$hash$var}", "somedata"
Then the eval would work, but it would evaluate to the contents of the array in hash element main, which is an empty list of values. That means your call would become
push( ( ), "somedata")
or just
push "somedata"
which is meaningless
This is a particularly unpleasant thing to want to do. Why do you think you need it?
I'm getting this error and cannot understand why this happens. It happens when I jump to another subroutine. Perhaps there is something I need to understand about Mojolicious on why this happens.
Here is the source code of my program:
#!/usr/bin/perl
use Mojolicious::Lite;
get '/' => sub { &start_home; };
app->start;
sub start_home {
my $d = shift;
my $something = $d->param('something');
### Do things with $something.... etc.. etc..
&go_somewhere_else; ### Go somewhere else
}
sub go_somewhere_else {
my $c = shift;
$c->render(text => "Hello World!");
### End of program
}
I am passing a value on to the renderer and there is a value - Why would it say it is undefined? My understanding is that this only happens if you jump to a subroutine and try to render output.
My operating system is Windows and I am using Strawberry Perl.
You need to pass the context object $c/$d to your second function. The undefined value is your $c in go_somewhere_else, because you call it without a parameter.
Initially, to make it work, do this.
sub start_home {
my $d = shift;
my $something = $d->param('something');
go_somewhere_else($d);
}
You are now passing the context, which you named $d (that's not the conventional name), to the other function, and the warning will go away.
That's because the form &subname; without parenthesis () makes #_ (that's the list of arguments to the function) available inside of go_somewhere_else, but because you shifted $d off, #_ is now empty, and hence your $c inside go_somewhere_else is undef.
Alternatively, you could also change the shift to an assignment with #_. But please, don't do that!
sub start_home {
my ( $d ) = #_;
my $something = $d->param('something');
&go_somewhere_else;
}
There are more things odd to the point of almost wrong here.
get '/' => sub { &start_home; };
You are currying the the start_home function, but you are not actually adding another parameter. I explained above why this works. But it's not great. In fact, it's confusing and complicated.
Instead, you should use a code reference for the route.
get '/' => \&start_home;
Inside of start_home, you should call your context $c as is the convention. You should also not use the ampersand & notation for calling functions. That changes the behavior in a way you most certainly do not want.
sub start_home {
my $c = shift;
my $something = $c->param('something');
# ...
go_somewhere_else($c);
}
To learn more about how function calls work in Perl, refer to perlsub.
This code does work. But my question is this: If I uncomment the two commented lines and comment out the next three lines, I would get a Can't modify non-lvalue subroutine and I would like to know why? I would save a variable and ride a ..., if I could use the commented lines.
Next question how would I make this more object oriented?
open FILE, "FBIDs" or die $!;
while (<FILE>) {
#csv = split /,/;
}
for (my $i=0;$i<$#csv;$i++) {
my $browser = LWP::UserAgent->new( );
my $url = "https://graph.facebook.com/$csv[$i]?fields=id,name\n";
my $response = $browser->get($url);
# $response->content=~s/[{}\"]//g;
# my #json = split (/[,:]/,$response->content);
my $resp=$response->content;
$resp=~s/[{}\"]//g;
my #json = split (/[,:]/,$resp);
print $json[1],", ",$json[3],"\n";
$browser->delete( );
}
close FILE;
Perl realizes you're trying to do something useless — modifying a value that's not stored anywhere — so it throws an error. Remember that $response->content is a method call (something that returns a value), not variable (storage aka lvalue).
$response->content or $response->content() is method call and you can't make substitution or change it.
On the other hand some perl functions can be treated in such way, and they are called lvalue subroutines.
I have a perl script (simplified) like so:
my $dh = Stats::Datahandler->new(); ### homebrew module
my %url_map = (
'/(article|blog)/' => \$dh->articleDataHandler,
'/video/' => \$dh->nullDataHandler,
);
Essentially, I'm going to loop through %url_map, and if the current URL matches a key, I want to call the function pointed to by the value of that key:
foreach my $key (keys %url_map) {
if ($url =~ m{$key}) {
$url_map{$key}($url, $visits, $idsite);
$mapped = 1;
last;
}
}
But I'm getting the message:
Can't use string ("/article/") as a subroutine ref while "strict refs" in use at ./test.pl line 236.
Line 236 happens to be the line $url_map{$key}($url, $visits, $idsite);.
I've done similar things in the past, but I'm usually doing it without parameters to the function, and without using a module.
Since this is being answered here despite being a dup, I may as well post the right answer:
What you need to do is store a code reference as the values in your hash. To get a code reference to a method, you can use the UNIVERSAL::can method of all objects. However, this is not enough as the method needs to be passed an invocant. So it is clearest to skip ->can and just write it this way:
my %url_map = (
'/(article|blog)/' => sub {$dh->articleDataHandler(#_)},
'/video/' => sub {$dh->nullDataHandler(#_)},
);
This technique will store code references in the hash that when called with arguments, will in turn call the appropriate methods with those arguments.
This answer omits an important consideration, and that is making sure that caller works correctly in the methods. If you need this, please see the question I linked to above:
How to take code reference to constructor?
You're overthinking the problem. Figure out the string between the two forward slashes, then look up the method name (not reference) in a hash. You can use a scalar variable as a method name in Perl; the value becomes the method you actually call:
%url_map = (
'foo' => 'foo_method',
);
my( $type ) = $url =~ m|\A/(.*?)/|;
my $method = $url_map{$type} or die '...';
$dh->$method( #args );
Try to get rid of any loops where most of the iterations are useless to you. :)
my previous answer, which I don't like even though it's closer to the problem
You can get a reference to a method on a particular object with can (unless you've implemented it yourself to do otherwise):
my $dh = Stats::Datahandler->new(); ### homebrew module
my %url_map = (
'/(article|blog)/' => $dh->can( 'articleDataHandler' ),
'/video/' => $dh->can( 'nullDataHandler' ),
);
The way you have calls the method and takes a reference to the result. That's not what you want for deferred action.
Now, once you have that, you call it as a normal subroutine dereference, not a method call. It already knows its object:
BEGIN {
package Foo;
sub new { bless {}, $_[0] }
sub cat { print "cat is $_[0]!\n"; }
sub dog { print "dog is $_[0]!\n"; }
}
my $foo = Foo->new;
my %hash = (
'cat' => $foo->can( 'cat' ),
'dog' => $foo->can( 'dog' ),
);
my #tries = qw( cat dog catbird dogberg dogberry );
foreach my $try ( #tries ) {
print "Trying $try\n";
foreach my $key ( keys %hash ) {
print "\tTrying $key\n";
if ($try =~ m{$key}) {
$hash{$key}->($try);
last;
}
}
}
The best way to handle this is to wrap your method calls in an anonymous subroutine, which you can invoke later. You can also use the qr operator to store proper regexes to avoid the awkwardness of interpolating patterns into things. For example,
my #url_map = (
{ regex => qr{/(article|blog)/},
method => sub { $dh->articleDataHandler }
},
{ regex => qr{/video/},
method => sub { $dh->nullDataHandler }
}
);
Then run through it like this:
foreach my $map( #url_map ) {
if ( $url =~ $map->{regex} ) {
$map->{method}->();
$mapped = 1;
last;
}
}
This approach uses an array of hashes rather than a flat hash, so each regex can be associated with an anonymous sub ref that contains the code to execute. The ->() syntax dereferences the sub ref and invokes it. You can also pass parameters to the sub ref and they'll be visible in #_ within the sub's block. You can use this to invoke the method with parameters if you want.