Recommended method to refactor variable name in Perl code? - perl

I can use Perltidy to reformat source. Quite useful.
If a source file uses a variable like #m, how can I most easily refactor that into something else, e.g. #miles_travelled?
Using a regular expression to rename does not appear safe, because a separate variable such as $m may also exist (with a different type, in this case a scalar), yet the #m variable can be referenced using an expression like $m[$i].
For example, none of the following will be correct for Perl code:
s/([\$\#])m/$1miles_travelled/g # Will rename scalar with same name
s/\$m/\$miles_travelled/g # Will fail to rename accesses of array
Is there a recommended tool or method for safely renaming a variable name in Perl code?

The variable $m always occurs as $m.
The variable #m always occurs as #m or $m[...].
The variable %m always occurs as %m or $m{...} or #m{...}.
… except with indirect method calls: new $m[...] parses as $m->new([...]). But we can probably ignore this case (use no indirect to make sure).
If we want to cover the first three cases properly, we can
replace a scalar by s/(?<=\$)OLDNAME(?!\s*[\[\{])/NEWNAME/g
replace an array by s/(?<=\#)OLDNAME(?!\{)|(?<=\$)OLDNAME(?=\s*\[)/NEWNAME/g
replace a hash by s/(?<=\%)OLDNAME|(?<=[\$\#])OLDNAME(?=\s*\{)/NEWNAME/g
Note that lookarounds or multiple passes for the different cases are neccessary.
Test:
use Test::More tests => 3;
my $scalar_re = qr/(?<=\$) foo (?!\s*[\[\{])/x;
my $array_re = qr/(?<=\#) foo (?!\{) | (?<=\$) foo (?=\s*\[)/x;
my $hash_re = qr/(?<=\%) foo | (?<=[\$\#]) foo (?=\s*\{)/x;
my $input = '$foo, $foo[1], #foo, $foo{a}, %foo, #foo{qw/a b/}';
my $scalar = '$bar, $foo[1], #foo, $foo{a}, %foo, #foo{qw/a b/}';
my $array = '$foo, $bar[1], #bar, $foo{a}, %foo, #foo{qw/a b/}';
my $hash = '$foo, $foo[1], #foo, $bar{a}, %bar, #bar{qw/a b/}';
is $input =~ s/$scalar_re/bar/xrg, $scalar;
is $input =~ s/$array_re /bar/xrg, $array;
is $input =~ s/$hash_re /bar/xrg, $hash;

The Padre editor will carry out a small number of simple refactorings automatically for you. "Rename variable" is one of them.

Related

Using perl `my` within actual function arguments

I want to use perl to build a document graph as readably as possible. For re-use of nodes, I want to refer to nodes using variables (or constants, if that is easier). The following code works and illustrates the idea with node types represented by literals or factory function calls to a and b. (For simple demo purposes, the functions do not create nodes but just return a string.)
sub a (#) {
return sprintf "a(%s)", join( ' ', #_ );
}
sub b (#) {
return sprintf "b(%s)", join( ' ', #_ );
}
printf "The document is: %s\n", a(
"declare c=",
$c = 1,
$e = b(
"use",
$c,
"to declare d=",
$d = $c + 1
),
"use the result",
$d,
"and document the procedure",
$e
);
The actual and expected output of this is The document is: a(declare c= 1 b(use 1 to declare d= 2) use the result 2 and document the procedure b(use 1 to declare d= 2)).
My problem arises because I want to use strict in the whole program so that variables like $c, $d, $e must be declared using my. I can, of course, write somewhere close to the top of the text my ( $c, $d, $e );. It would be more efficient at edit-time when I could use the my keyword directly at the first mention of the variable like so:
…
printf "The document is: %s\n", a(
"declare c=",
my $c = 1,
my $e = b(
"use",
$c,
"to declare d=",
my $d = $c + 1
),
"use the result",
$d,
"and document the procedure",
$e
);
This would be kind of my favourite syntax. Unfortunately, this code yields several Global symbol "…" requires explicit package name errors. (Moreover, according to documentation, my does not return anything.)
I have the idea of such use of my from uses like in open my $file, '<', 'filename.txt' or die; or in for ( my $i = 0; $i < 100; ++$i ) {…} where declaration and definition go in one.
Since the nodes in the graph are constants, it is acceptable to use something else than lexical variables. (But I think perl's built-in mechanims are strongest and most efficient for lexical variables, which is why I am inclined into this direction.)
My current idea to solve the issue is to define a function named something like define which behind the scenes would manipulate the current set of lexical variables using PadWalker or similar. Yet this would not allow me to use a natural perl like syntax like $c = 1, which would be my preferred syntax.
I am not certain of the exact need but here's one simple way for similar manipulations.
The example in the OP wants a named variable inside the function call statement itself, so that it can be used later in that statement for another call etc. If you must have it that way then you can use a do block to work out your argument list
func1(
do {
my $x = 5;
my $y = func2($x); # etc
say "Return from the do block what is then passed as arguments...";
$x, $y
}
);
This allows you to do things of the kind that your example indicates.†
If you also want to have names available in the subroutine then pass a hash (or a hashref), with suitably chosen key names for variables, and in the sub work with key names.
Alternatively, consider normally declaring your variables ahead of the function call. There's no bad thing about it while there are many good things. Can throw in a little wrapper and make it look nice, too.
† More specifically
printf "The document is: %s\n", a( do {
my $c = 1;
my $d = $c + 1;
my $e = b( "use", $c, "to declare d=", $d );
# Return a list from this `do`, which is then passed as arguments to a()
"declare c=", $c, $e, "use the result", $d,"and document the procedure", $e
} );
(condensed into fewer lines for posting here)
This do block is a half-way measure toward moving this code into a subroutine, as I presume that there are reasons to want this inlined. However, since comments indicate that the reality is even more complex I'd urge you to write a normal sub instead (in which a graph can be built, btw).
according to documentation, my does not return anything
The documentation doesn't say that, and it's not the case.
Haven't you ever done my $x = 123;? If so, you've assigned to the result of my $x. my simply returns the newly created variable as an lvalue (assignable value), so my $x simply returns $x.
Unfortunately, this code yields several [strict vars] errors.
Symbols (variables) created by my are only visible starting with the following statement.
For better of for worse, it allows the following:
my $x = 123;
{
my $x = $x;
$x *= 2;
say $x; # 246
}
say $x; # 123
I want to use perl to build a document graph as readably as possible.
So why not do that? Right now, you are building a string, not a graph. Build a graph of objects that resolve to a string after the graph has been constructed. You can build those object with a tree of sub calls (declare( c => [ use( c => ... ), ... ] )). I'd give a better example, but the grammar of what you are generating isn't clear to me.
Your argument list makes two references each to $c, $d and $e. If you prefix the first reference with my, it will be out of scope by the time Perl gets around to parsing the second reference it won't be in scope until the next statement, so the second reference would refer to a different variable (which may violate strict vars).
Declare my ($c,$d,$e) before your function call. There is nothing wrong or inelegant about doing that.

2 Sub references as arguments in perl

I have perl function I dont what does it do?
my what does min in perl?
#ARVG what does mean?
sub getArgs
{
my $argCnt=0;
my %argH;
for my $arg (#ARGV)
{
if ($arg =~ /^-/) # insert this entry and the next in the hash table
{
$argH{$ARGV[$argCnt]} = $ARGV[$argCnt+1];
}
$argCnt++;
}
return %argH;}
Code like that makes David sad...
Here's a reformatted version of the code doing the indentations correctly. That makes it so much easier to read. I can easily tell where my if and loops start and end:
sub getArgs {
my $argCnt = 0;
my %argH;
for my $arg ( #ARGV ) {
if ( $arg =~ /^-/ ) { # insert this entry and the next in the hash table
$argH{ $ARGV[$argCnt] } = $ARGV[$argCnt+1];
}
$argCnt++;
}
return %argH;
}
The #ARGV is what is passed to the program. It is an array of all the arguments passed. For example, I have a program foo.pl, and I call it like this:
foo.pl one two three four five
In this case, $ARGV is set to the list of values ("one", "two", "three", "four", "five"). The name comes from a similar variable found in the C programming language.
The author is attempting to parse these arguments. For example:
foo.pl -this that -the other
would result in:
$arg{"-this"} = "that";
$arg{"-the"} = "other";
I don't see min. Do you mean my?
This is a wee bit of a complex discussion which would normally involve package variables vs. lexically scoped variables, and how Perl stores variables. To make things easier, I'm going to give you a sort-of incorrect, but technically wrong answer: If you use the (strict) pragma, and you should, you have to declare your variables with my before they can be used. For example, here's a simple two line program that's wrong. Can you see the error?
$name = "Bob";
print "Hello $Name, how are you?\n";
Note that when I set $name to "Bob", $name is with a lowercase n. But, I used $Name (upper case N) in my print statement. As it stands, now. Perl will print out "Hello, how are you?" without a care that I've used the wrong variable name. If it's hard to spot an error like this in a two line program, imagine what it would be like in a 1000 line program.
By using strict and forcing me to declare variables with my, Perl can catch that error:
use strict;
use warnings; # Another Pragma that should always be used
my $name = "Bob";
print "Hello $Name, how are you doing\n";
Now, when I run the program, I get the following error:
Global symbol "$Name" requires explicit package name at (line # of print statement)
This means that $Name isn't defined, and Perl points to where that error is.
When you define variables like this, they are in scope with in the block where it's defined. A block could be the code contained in a set of curly braces or a while, if, or for statement. If you define a variable with my outside of these, it's defined to the end of the file.
Thus, by using my, the variables are only defined inside this subroutine. And, the $arg variable is only defined in the for loop.
One more thing:
The person who wrote this should have used the Getopt::Long module. There's a major bug in their code:
For example:
foo.pl -this that -one -two
In this case, my hash looks like this:
$args{'-this'} = "that";
$args{'-one'} = "-two";
$args{'-two'} = undef;
If I did this:
if ( defined $args{'-two'} ) {
...
}
I would not execute the if statement.
Also:
foo.pl -this=that -one -two
would also fail.
#ARGV is a special variable (refer to perldoc perlvar):
#ARGV
The array #ARGV contains the command-line arguments intended for the
script. $#ARGV is generally the number of arguments minus one, because
$ARGV[0] is the first argument, not the program's command name itself.
See $0 for the command name.
Perl documentation is also available from your command line:
perldoc -v #ARGV

Data types for parameters of subroutines / functions?

In Perl, can one specifiy data types for the parameters of subroutines? E.g. when using a dualvar in a numeric context like exit:
use constant NOTIFY_DIE_MAIL_SEND_FAILED => dualvar 3, 'NOTIFY_DIE_MAIL_SEND_FAILED';
exit NOTIFY_DIE_MAIL_SEND_FAILED;
How does Perl in that case know, that exit expects a numeric parameter? I didn't see a way to define data types for the parameters of subroutines like you do it in Java? (where I could understand how the data type is known as it is explicitely defined)
The whole point of the dualvar is that it behaves as a number or text depending on what you want. In cases where that's not obvious (to you more importantly than to perl) then make it clear.
exit 0 + NOTIFY_DIE_MAIL_SEND_FAILED;
As for explicitly typing parameters, that's not something built in. Perl is a much more dynamic language than Java so it's not common to check/force the type of every parameter or variable. In particular, a perl sub can accept different numbers of parameters and even different structures.
If you want to validate parameters (for an external API for example) try something like Params::Validate
In addition, Moose and Moo allow a certain level of attribute typing and even coercion.
In Perl, scalars are both numeric and stringy at the same time. It is not the variables themselves that distinguish between strings and numbers, but the operators you work with. While the addition + only uses a number, the concatenation . only uses strings.
In more strongly typing languages, e.g. Java, the addition operator doubles as addition and concatenation operator, because it can access type information.
"1" + 2 + 3 is still sick in Java, whereas Perl can cleanly distinguish between "1" + 2 + 3 == 6 and "1" . 2 . 3 eq "123".
You can force numeric or stringy context of a variable by adding 0 or concatenating the empty string:
sub foo {
my ($var) = #_;
$var += 0; # $var is numeric
$var .= ""; # $var is stringy now
}
Perl is quite different from Java in that - Perl is dynamically typed language, because it does not requires its variables to be typed at compile time..
Whereas, Java is statically typed (as you know already)
Perl determines the type of the variable depending upon the context it is used..
There can be only two context: -
List Context
Scalar Context
And the context is defined by the operator or function that is used..
For EG:-
# Define a list
#arr = qw/rohit jain/;
# Define a scalar
$num = 2
# Here perl will evaluate #arr in scalar context and take its length..
# so, below code will evaluate to : - value = 2 / 2
$value = #arr / $num;
# Here since it is used with a foreach loop, #arr will be taken as in list context
foreach (#arr) {
say $_;
}
# Above foreach loop will output: - `rohit` \n `jain` to the console..
You can force the type by:
use Scalar::Util qw(dualvar);
use constant NOTIFY_DIE_MAIL_SEND_FAILED => dualvar 3, 'NOTIFY_DIE_MAIL_SEND_FAILED';
say NOTIFY_DIE_MAIL_SEND_FAILED;
say int(NOTIFY_DIE_MAIL_SEND_FAILED);
output:
NOTIFY_DIE_MAIL_SEND_FAILED
3
How does Perl in that case know, that exit expects a numeric parameter?
exit expect a number as is part of its specification and its behaviour is kind of undefined if you pass it a non-integer value (i.e. you should not do it.
Now, in this particular case, how does dualvar manages to return either value type depending of the context?
I don't know how Scalar::Util's dualvar is implemented but you can write something similar with overload instead.
You certainly can modify the behaviour for a blessed object:
#!/usr/bin/env perl
use strict;
use warnings;
{package Dualvar;
use overload
fallback => 1,
'0+' => sub { $_[0]->{INT_VAL} },
'""' => sub { $_[0]->{STR_VAL} };
sub new {
my $class = shift;
my $self = { INT_VAL => shift, STR_VAL => shift };
bless($self,$class);
}
1;
}
my $x = Dualvar->new(31,'Therty-One');
print $x . " + One = ",$x + 1,"\n"; # Therty-One + One = 32
From the docs, it seems that overload actually changes the behaviour within the declaration scope so you should be able to change the behaviour of some common operators locally for any operand.
If exit does use one of those overloadable operations to evaluate its parameter into a integer then this solution would do.
I didn't see a way to define data types for the parameters of subroutines like you do it in Java?
As already said by others... this is not the case in Perl, at least not at compilation time, except for subroutine prototypes but these don't offer much type granularity (like int vs strings or different object classes).
Richard has mentioned some run-time alternatives you may use. I personally would recommend Moose if you don't mind the performance penalty.
What Rohit Jain said is correct. A function that wants input to follow certain rules simply has to explicitly check that the input is valid.
For example
sub foo
{
my ($param1,$param2) = shift;
$param1 =~ /^\d+$/ or die "Parameter 1 must be a positive integer.";
$param2 =~ /^(bar|baz)$/ or die "Parameter 2 must be either 'bar' or 'baz'";
...
}
This may seem like a pain, but:
The extra flexibility gained generally outweighs the work involved in doing this.
Simply having the correct data type is often not enough to ensure that you valid input, so you end up doing a lot this anyway even in a language like Java.

Scope of the default variable $_ in Perl

I have the following method which accepts a variable and then displays info from a database:
sub showResult {
if (#_ == 2) {
my #results = dbGetResults($_[0]);
if (#results) {
foreach (#results) {
print "$count - $_[1] (ID: $_[0])\n";
}
} else {
print "\n\nNo results found";
}
}
}
Everything works fine, except the print line in the foreach loop. This $_ variable still contains the values passed to the method.
Is there anyway to 'force' the new scope of values on $_, or will it always contain the original values?
If there are any good tutorials that explain how the scope of $_ works, that would also be cool!
Thanks
The problem here is that you're using really #_ instead of $_. The foreach loop changes $_, the scalar variable, not #_, which is what you're accessing if you index it by $_[X]. Also, check again the code to see what it is inside #results. If it is an array of arrays or refs, you may need to use the indirect ${$_}[0] or something like that.
In Perl, the _ name can refer to a number of different variables:
The common ones are:
$_ the default scalar (set by foreach, map, grep)
#_ the default array (set by calling a subroutine)
The less common:
%_ the default hash (not used by anything by default)
_ the default file handle (used by file test operators)
&_ an unused subroutine name
*_ the glob containing all of the above names
Each of these variables can be used independently of the others. In fact, the only way that they are related is that they are all contained within the *_ glob.
Since the sigils vary with arrays and hashes, when accessing an element, you use the bracket characters to determine which variable you are accessing:
$_[0] # element of #_
$_{...} # element of %_
$$_[0] # first element of the array reference stored in $_
$_->[0] # same
The for/foreach loop can accept a variable name to use rather than $_, and that might be clearer in your situation:
for my $result (#results) {...}
In general, if your code is longer than a few lines, or nested, you should name the variables rather than relying on the default ones.
Since your question was related more to variable names than scope, I have not discussed the actual scope surrounding the foreach loop, but in general, the following code is equivalent to what you have.
for (my $i = 0; $i < $#results; $i++) {
local *_ = \$results[$i];
...
}
The line local *_ = \$results[$i] installs the $ith element of #results into the scalar slot of the *_ glob, aka $_. At this point $_ contains an alias of the array element. The localization will unwind at the end of the loop. local creates a dynamic scope, so any subroutines called from within the loop will see the new value of $_ unless they also localize it. There is much more detail available about these concepts, but I think they are outside the scope of your question.
As others have pointed out:
You're really using #_ and not $_ in your print statement.
It's not good to keep stuff in these variables since they're used elsewhere.
Officially, $_ and #_ are global variables and aren't members of any package. You can localize the scope with my $_ although that's probably a really, really bad idea. The problem is that Perl could use them without you even knowing it. It's bad practice to depend upon their values for more than a few lines.
Here's a slight rewrite in your program getting rid of the dependency on #_ and $_ as much as possible:
sub showResults {
my $foo = shift; #Or some meaningful name
my $bar = shift; #Or some meaningful name
if (not defined $foo) {
print "didn't pass two parameters\n";
return; #No need to hang around
}
if (my #results = dbGetResults($foo)) {
foreach my $item (#results) {
...
}
}
Some modifications:
I used shift to give your two parameters actual names. foo and bar aren't good names, but I couldn't find out what dbGetResults was from, so I couldn't figure out what parameters you were looking for. The #_ is still being used when the parameters are passed, and my shift is depending upon the value of #_, but after the first two lines, I'm free.
Since your two parameters have actual names, I can use the if (not defined $bar) to see if both parameters were passed. I also changed this to the negative. This way, if they didn't pass both parameters, you can exit early. This way, your code has one less indent, and you don't have a if structure that takes up your entire subroutine. It makes it easier to understand your code.
I used foreach my $item (#results) instead of foreach (#results) and depend upon $_. Again, it's clearer what your program is doing, and you wouldn't have confused $_->[0] with $_[0] (I think that's what you were doing). It would have been obvious you wanted $item->[0].

Confusion about proper usage of dereference in Perl

I noticed the other day that - while altering values in a hash - that when you dereference a hash in Perl, you actually are making a copy of that hash. To confirm I wrote this quick little script:
#! perl
use warnings;
use strict;
my %h = ();
my $hRef = \%h;
my %h2 = %{$hRef};
my $h2Ref = \%h2;
if($hRef eq $h2Ref) {
print "\n\tThey're the same $hRef $h2Ref";
}
else {
print "\n\tThey're NOT the same $hRef $h2Ref";
}
print "\n\n";
The output:
They're NOT the same HASH(0x10ff6848) HASH(0x10fede18)
This leads me to realize that there could be spots in some of my scripts where they aren't behaving as expected. Why is it even like this in the first place? If you're passing or returning a hash, it would be more natural to assume that dereferencing the hash would allow me to alter the values of the hash being dereferenced. Instead I'm just making copies all over the place without any real need/reason to beyond making syntax a little more obvious.
I realize the fact that I hadn't even noticed this until now shows its probably not that big of a deal (in terms of the need to go fix in all of my scripts - but important going forward). I think its going to be pretty rare to see noticeable performance differences out of this, but that doesn't alter the fact that I'm still confused.
Is this by design in perl? Is there some explicit reason I don't know about for this; or is this just known and you - as the programmer - expected to know and write scripts accordingly?
The problem is that you are making a copy of the hash to work with in this line:
my %h2 = %{$hRef};
And that is understandable, since many posts here on SO use that idiom to make a local name for a hash, without explaining that it is actually making a copy.
In Perl, a hash is a plural value, just like an array. This means that in list context (such as you get when assigning to a hash) the aggregate is taken apart into a list of its contents. This list of pairs is then assembled into a new hash as shown.
What you want to do is work with the reference directly.
for (keys %$hRef) {...}
for (values %$href) {...}
my $x = $href->{some_key};
# or
my $x = $$href{some_key};
$$href{new_key} = 'new_value';
When working with a normal hash, you have the sigil which is either a % when talking about the entire hash, a $ when talking about a single element, and # when talking about a slice. Each of these sigils is then followed by an identifier.
%hash # whole hash
$hash{key} # element
#hash{qw(a b)} # slice
To work with a reference named $href simply replace the string hash in the above code with $href. In other words, $href is the complete name of the identifier:
%$href # whole hash
$$href{key} # element
#$href{qw(a b)} # slice
Each of these could be written in a more verbose form as:
%{$href}
${$href}{key}
#{$href}{qw(a b)}
Which is again a substitution of the string '$href' for 'hash' as the name of the identifier.
%{hash}
${hash}{key}
#{hash}{qw(a b)}
You can also use a dereferencing arrow when working with an element:
$hash->{key} # exactly the same as $$hash{key}
But I prefer the doubled sigil syntax since it is similar to the whole aggregate and slice syntax, as well as the normal non-reference syntax.
So to sum up, any time you write something like this:
my #array = #$array_ref;
my %hash = %$hash_ref;
You will be making a copy of the first level of each aggregate. When using the dereferencing syntax directly, you will be working on the actual values, and not a copy.
If you want a REAL local name for a hash, but want to work on the same hash, you can use the local keyword to create an alias.
sub some_sub {
my $hash_ref = shift;
our %hash; # declare a lexical name for the global %{__PACKAGE__::hash}
local *hash = \%$hash_ref;
# install the hash ref into the glob
# the `\%` bit ensures we have a hash ref
# use %hash here, all changes will be made to $hash_ref
} # local unwinds here, restoring the global to its previous value if any
That is the pure Perl way of aliasing. If you want to use a my variable to hold the alias, you can use the module Data::Alias
You are confusing the actions of dereferencing, which does not inherently create a copy, and using a hash in list context and assigning that list, which does. $hashref->{'a'} is a dereference, but most certainly does affect the original hash. This is true for $#$arrayref or values(%$hashref) also.
Without the assignment, just the list context %$hashref is a mixed beast; the resulting list contains copies of the hash keys but aliases to the actual hash values. You can see this in action:
$ perl -wle'$x={"a".."f"}; for (%$x) { $_=chr(ord($_)+10) }; print %$x'
epcnal
vs.
$ perl -wle'$x={"a".."f"}; %y=%$x; for (%y) { $_=chr(ord($_)+10) }; print %$x; print %y'
efcdab
epcnal
but %$hashref isn't acting any differently than %hash here.
No, dereferencing does not create a copy of the referent. It's my that creates a new variable.
$ perl -E'
my %h1; my $h1 = \%h1;
my %h2; my $h2 = \%h2;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x83b62e0)
HASH(0x83b6340)
0
$ perl -E'
my %h;
my $h1 = \%h;
my $h2 = \%h;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x9eae2d8)
HASH(0x9eae2d8)
1
No, $#{$someArrayHashRef} does not create a new array.
If perl did what you suggest, then variables would get aliased very easily, which would be far more confusing. As it is, you can alias variables with globbing, but you need to do so explicitly.