Confusion about using perl #_ variable with regex capture - perl

When I using following code, subroutine f can't print #_ correctly for the substitution of $tmp before it, which is explainable.
use strict;
use warnings;
sub f {
# print("args = ", #_);
my $tmp = "111";
$tmp =~ s/\d+//;
print("args = ", #_);
}
"dd11ddd" =~ /(?<var>\d+)/;
f($+{"var"});
But when I uncomment the first print statement, then both print could give the correct #_, which makes me confused, why the capture group hasn't been overwrite. Or just some underlay mechanism of perl I don't know? Please help, thanks.
When I pass capture group into perl subroutine, the capture group hasn't been overwritten as expected.
I want to know why this could happen and how to explain it correctly.

Perl arguments are passed by reference.
sub f {
$_[0] = "def";
}
my $x = "abc;
say $x; # abc
f( $x );
say $x; # def
%+ is affected by $tmp =~ s/\d+//, and thus so is $_[0]. We don't usually run into problems because we usually make an explicit copy of the arguments.
sub f {
my $y = shift;
$y = "def";
}
my $x = "abc";
say $x; # abc
f( $x );
say $x; # abc
Passing a copy of the scalar would also avoid the problems.
sub f {
$_[0] = "def";
}
my $x = "abc";
say $x; # abc
f( "$x" );
say $x; # abc
The above explains why you get weird behaviour and how to avoid it, but not why accessing $_[0] before the substitution seems to fix it. Honestly, it doesn't really matter. It's some weird interaction between the magical nature of %+, the implicit localization of %+, the optimizations to avoid needless implication localizations of %+, and the way localization works.

Related

Perl: Modify variable passed as param to subroutine

I need to modify a variable inside a routine, so it keeps the changes after leaving the routine. Here's an example:
$text = "hello";
&convert_to_uppercase($text);
print $text;
I want to see "HELLO" on the screen, not "hello".
The routine would be:
sub convert_to_uppercase($text){
<something like $text = uc($text);>
}
I know how to do it in PHP, but it seems that the parameters are not changed the same way. And, I've been searching everywhere and I couldn't find a concrete answer.
You really shouldn't use an ampersand & when calling a Perl subroutine. It is necessary only when treating the code as a data item, for instance when taking a reference, like \&convert_to_uppercase. Using it in a call hasn't been necessary since version 4 of Perl 5, and it does some arcane things that you probably don't want.
It is unusual for subroutines to modify their parameters, but the elements of #_ are aliases of the actual parameters so you can do what you ask by modifying that array.
If you write your subroutine like this
sub convert_to_uppercase {
$_[0] = uc $_[0];
}
then it will do what you ask. But it is generally best to return the modified value so that the decision on whether to overwrite the original value can be taken by the calling code. For instance, if I have
sub upper_case {
uc shift;
}
then it can be called either as
my $text = "hello";
$text = upper_case($text);
print $text;
which does as you require, and modifies $text; or as
my $text = "hello";
print upper_case($text);
which leaves $text unchanged, but returns the altered value.
Passing a reference and modifying the original variable inside the subroutine would be done like this:
$text = 'hello';
convert_to_uppercase(\$text); #notice the \ before $text
print $text;
sub convert_to_uppercase { #perl doesn't specify arguments here
### arguments will be in #_, so #_ is now a list like ('hello')
my $ref = shift; #$ref is NOT 'hello'. it's '$text'
### add some output so you can see what's going on:
print 'Variable $ref is: ', $ref, " \n"; #will print some hex number like SCALAR(0xad1d2)
print 'Variable ${$ref} is: ', ${$ref}, " \n"; #will print 'hello'
# Now do what this function is supposed to do:
${$ref} = uc ${$ref}; #it's modifying the original variable, not a copy of it
}
The other way is to create a return value inside the subroutine and modify the variable outside of the subroutine:
$text = 'hello';
$text = convert_to_uppercase($text); #there's no \ this time
print $text;
sub convert_to_uppercase {
# #_ contains 'hello'
my $input = shift; #$input is 'hello'
return uc $input; #returns 'HELLO'
}
But the convert_to_uppercase routine seems redundant because that's what uc does. Skip all of that and just do this:
$text = 'hello';
$text = uc $text;

Is it possible to convert a stringified reference from a SCALAR back to a REF? [duplicate]

Is there any way to get Perl to convert the stringified version e.g (ARRAY(0x8152c28)) of an array reference to the actual array reference?
For example
perl -e 'use Data::Dumper; $a = [1,2,3];$b = $a; $a = $a.""; warn Dumper (Then some magic happens);'
would yield
$VAR1 = [
1,
2,
3
];
Yes, you can do this (even without Inline C). An example:
use strict;
use warnings;
# make a stringified reference
my $array_ref = [ qw/foo bar baz/ ];
my $stringified_ref = "$array_ref";
use B; # core module providing introspection facilities
# extract the hex address
my ($addr) = $stringified_ref =~ /.*(0x\w+)/;
# fake up a B object of the correct class for this type of reference
# and convert it back to a real reference
my $real_ref = bless(\(0+hex $addr), "B::AV")->object_2svref;
print join(",", #$real_ref), "\n";
but don't do that. If your actual object is freed or reused, you may very well
end up getting segfaults.
Whatever you are actually trying to achieve, there is certainly a better way.
A comment to another answer reveals that the stringification is due to using a reference as a hash key. As responded to there, the better way to do that is the well-battle-tested
Tie::RefHash.
The first question is: do you really want to do this?
Where is that string coming from?
If it's coming from outside your Perl program, the pointer value (the hex digits) are going to be meaningless, and there's no way to do it.
If it's coming from inside your program, then there's no need to stringify it in the first place.
Yes, it's possible: use Devel::FindRef.
use strict;
use warnings;
use Data::Dumper;
use Devel::FindRef;
sub ref_again {
my $str = #_ ? shift : $_;
my ($addr) = map hex, ($str =~ /\((.+?)\)/);
Devel::FindRef::ptr2ref $addr;
}
my $ref = [1, 2, 3];
my $str = "$ref";
my $ref_again = ref_again($str);
print Dumper($ref_again);
The stringified version contains the memory address of the array object, so yes, you can recover it. This code works for me, anyway (Cygwin, perl 5.8):
use Inline C;
#a = (1,2,3,8,12,17);
$a = \#a . "";
print "Stringified array ref is $a\n";
($addr) = $a =~ /0x(\w+)/;
$addr = hex($addr);
$c = recover_arrayref($addr);
#c = #$c;
print join ":", #c;
__END__
__C__
AV* recover_arrayref(int av_address) { return (AV*) av_address; }
.
$ perl ref-to-av.pl
Stringified array ref is ARRAY(0x67ead8)
1:2:3:8:12:17
I'm not sure why you want to do this, but if you really need it, ignore the answers that use the tricks to look into memory. They'll only cause you problems.
Why do you want to do this? There's probably a better design. Where are you getting that stringified reference from.
Let's say you need to do it for whatever reason. First, create a registry of objects where the hash key is the stringified form, and the value is a weakened reference:
use Scalar::Util qw(weaken);
my $array = [ ... ];
$registry{ $array } = $array;
weaken( $registry{ $array } ); # doesn't count toward ref count
Now, when you have the stringified form, you just look it up in the hash, checking to see that it's still a reference:
if( ref $registry{$string} ) { ... }
You could also try Tie::RefHash and let it handle all of the details of this.
There is a longer example of this in Intermediate Perl.
In case someone finds this useful, I'm extending tobyink's answer by adding support for detecting segmentation faults. There are two approaches I discovered. The first way locally replaces $SIG{SEGV} and $SIG{BUS} before dereferencing. The second way masks the child signal and checks if a forked child can dereference successfully. The first way is significantly faster than the second.
Anyone is welcome to improve this answer.
First Approach
sub unstringify_ref($) {
use bigint qw(hex);
use Devel::FindRef;
my $str = #_ ? shift : $_;
if (defined $str and $str =~ /\((0x[a-fA-F0-9]+)\)$/) {
my $addr = (hex $1)->bstr;
local $#;
return eval {
local $SIG{SEGV} = sub { die };
local $SIG{BUS} = sub { die };
return Devel::FindRef::ptr2ref $addr;
};
}
return undef;
}
I'm not sure if any other signals can occur in an attempt to access illegal memory.
Second Approach
sub unstringify_ref($) {
use bigint qw(hex);
use Devel::FindRef;
use Signal::Mask;
my $str = #_ ? shift : $_;
if (defined $str and $str =~ /\((0x[a-fA-F0-9]+)\)$/) {
my $addr = (hex $1)->bstr;
local $!;
local $?;
local $Signal::Mask{CHLD} = 1;
if (defined(my $kid = fork)) {
# Child -- This might seg fault on invalid address.
exit(not Devel::FindRef::ptr2ref $addr) unless $kid;
# Parent
waitpid $kid, 0;
return Devel::FindRef::ptr2ref $addr if $? == 0;
} else {
warn 'Unable to fork: $!';
}
}
return undef;
}
I'm not sure if the return value of waitpid needs to be checked.

Routine as argument -- generic variables not working

I am working on writing a gaming system (wargames, etc.) and am creating the system for creating and displaying hex maps. I realized quickly that I am repeatedly doing a nested loop of x=(0..maxx) and y=(0..maxy). So I attempted to adapt some code I found somewhere (one of the advanced perl books, I forget where) to create an easier way to do this sort of looping thing. This is what I came up with:
sub fillmap (&#) {
my $code = shift;
no strict 'refs';
use vars qw($x $y);
my $caller = caller;
local(*{$caller."::x"}) = \my $x;
local(*{$caller."::y"}) = \my $y;
foreach $x (0..5) {
foreach $y (0..3) {
warn "fillmap $x,$y\n";
&{$code}($x,$y);
}
}
}
It's suppose to work like sort, but using $x and $y instead of $a and $b.
Note: the warn statement is for debugging. I also simplified the x and y ranges (the array passed in determines the maxx and maxy values, but I didn't want to muddy this discussion with the routines for calculating them... I just hard-coded them to maxx=5 and maxy=3)
So, this execution of this routine like so:
fillmap {warn "$x,$y\n";} #map;
should yield a list of the x,y pairs. But instead, it gives me this:
fillmap 0,0
,
fillmap 0,1
,
fillmap 0,2
,
fillmap 0,3
,
fillmap 1,0
,
...
Note, the "fillmap" lines are from the subroutine for debugging. But instead of each x,y pair, I just get the comma ($x and $y are undefined).
What am I doing wrong?
The problem is that for $x does its own localisation. The $x inside the loop isn't the $x that's aliased to $caller::x.
You need to do one of the following:
Copy $x into $caller::x inside the loop.
Alias $caller::x to $x inside the loop.
The following does the latter:
use strict;
use warnings;
sub fillmap(&#) {
my $code = shift;
my $caller = caller();
my $xp = do { no strict 'refs'; \*{$caller.'::x'} }; local *$xp;
my $yp = do { no strict 'refs'; \*{$caller.'::y'} }; local *$yp;
for my $x (0..1) {
*$xp = \$x;
for my $y (0..2) {
*$yp = \$y;
$code->();
}
}
}
our ($x, $y);
fillmap { warn "$x,$y\n"; } '...';
You could avoid the need for our ($x, $y); by using $a and $b instead of $x and $y. You can't solve the problem by moving it (or use vars qw( $x $y );) into fillmap because you obviously intend fillmap to be used in a different package and lexical scope than the caller.

Perl, evaluate string lazily

Consider the following Perl code.
#!/usr/bin/perl
use strict;
use warnings;
$b="1";
my $a="${b}";
$b="2";
print $a;
The script obviously outputs 1. I would like it to be whatever the current value of $b is.
What would be the smartest way in Perl to achieve lazy evaluation like this? I would like the ${b} to remain "unreplaced" until $a is needed.
I'm more interested in knowing why you want to do this. You could use a variety of approaches depending on what you really need to do.
You could wrap up the code in a coderef, and only evaluate it when you need it:
use strict; use warnings;
my $b = '1';
my $a = sub { $b };
$b = '2';
print $a->();
A variant of this would be to use a named function as a closure (this is probably the best approach, in the larger context of your calling code):
my $b = '1';
sub print_b
{
print $b;
}
$b = '2';
print_b();
You could use a reference to the original variable, and dereference it as needed:
my $b = '1';
my $a = \$b;
$b = '2';
print $$a;
What you want is not lazy evaluation, but late binding. To get it in Perl, you need to use eval.
my $number = 3;
my $val = "";
my $x = '$val="${number}"';
$number = 42;
eval $x;
print "val is now $val\n";
Be advised that eval is usually inefficient as well as methodically atrocious. You are almost certainly better off using a solution from one of the other answers.
Perl will interpolate a string when the code runs, and i don't know of a way to make it not do so, short of formats (which are ugly IMO). What you could do, though, is change "when the code runs" to something more convenient, by wrapping the string in a sub and calling it when you need the string interpolated...
$b = "1";
my $a = sub { "\$b is $b" };
$b = "2";
print &$a;
Or, you could do some eval magic, but it's a bit more intrusive (you'd need to do some manipulation of the string in order to achieve it).
As others have mentioned, Perl will only evaluate strings as you have written them using eval to invoke the compiler at runtime. You could use references as pointed out in some other answers, but that changes the way the code looks ($$a vs $a). However, this being Perl, there is a way to hide advanced functionality behind a simple variable, by using tie.
{package Lazy;
sub TIESCALAR {bless \$_[1]} # store a reference to $b
sub FETCH {${$_[0]}} # dereference $b
sub STORE {${$_[0]} = $_[1]} # dereference $b and assign to it
sub new {tie $_[1] => $_[0], $_[2]} # syntactic sugar
}
my $b = 1;
Lazy->new( my $a => $b ); # '=>' or ',' but not '='
print "$a\n"; # prints 1
$b = 2;
print "$a\n"; # prints 2
You can lookup the documentation for tie, but in a nutshell, it allows you to define your own implementation of a variable (for scalars, arrays, hashes, or file handles). So this code creates the new variable $a with an implementation that gets or sets the current value of $b (by storing a reference to $b internally). The new method is not strictly needed (the constructor is actually TIESCALAR) but is provided as syntactic sugar to avoid having to use tie directly in the calling code.
(which would be tie my $a, 'Lazy', $b;)
You wish to pretend that $a refers to something that is evaluated when $a is used... You can only do that if $a is not truly a scalar, it could be a function (as cHao's answer) or, in this simple case, a reference to the other variable
my $b="1";
my $a= \$b;
$b="2";
print $$a;
I would like the ${b} to remain "unreplaced" until $a is needed.
Then I'd recommend eschewing string interpolation, instead using sprintf, so that you "interpolate" when needed.
Of course, on this basis you could tie together something quick(ish) and dirty:
use strict;
use warnings;
package LazySprintf;
# oh, yuck
sub TIESCALAR { my $class = shift; bless \#_, $class; }
sub FETCH { my $self = shift; sprintf $self->[0], #$self[1..$#$self]; }
package main;
my $var = "foo";
tie my $lazy, 'LazySprintf', '%s', $var;
print "$lazy\n"; # prints "foo\n"
$var = "bar";
print "$lazy\n"; # prints "bar\n";
Works with more exotic format specifiers, too. Yuck.

How can I convert the stringified version of array reference to actual array reference in Perl?

Is there any way to get Perl to convert the stringified version e.g (ARRAY(0x8152c28)) of an array reference to the actual array reference?
For example
perl -e 'use Data::Dumper; $a = [1,2,3];$b = $a; $a = $a.""; warn Dumper (Then some magic happens);'
would yield
$VAR1 = [
1,
2,
3
];
Yes, you can do this (even without Inline C). An example:
use strict;
use warnings;
# make a stringified reference
my $array_ref = [ qw/foo bar baz/ ];
my $stringified_ref = "$array_ref";
use B; # core module providing introspection facilities
# extract the hex address
my ($addr) = $stringified_ref =~ /.*(0x\w+)/;
# fake up a B object of the correct class for this type of reference
# and convert it back to a real reference
my $real_ref = bless(\(0+hex $addr), "B::AV")->object_2svref;
print join(",", #$real_ref), "\n";
but don't do that. If your actual object is freed or reused, you may very well
end up getting segfaults.
Whatever you are actually trying to achieve, there is certainly a better way.
A comment to another answer reveals that the stringification is due to using a reference as a hash key. As responded to there, the better way to do that is the well-battle-tested
Tie::RefHash.
The first question is: do you really want to do this?
Where is that string coming from?
If it's coming from outside your Perl program, the pointer value (the hex digits) are going to be meaningless, and there's no way to do it.
If it's coming from inside your program, then there's no need to stringify it in the first place.
Yes, it's possible: use Devel::FindRef.
use strict;
use warnings;
use Data::Dumper;
use Devel::FindRef;
sub ref_again {
my $str = #_ ? shift : $_;
my ($addr) = map hex, ($str =~ /\((.+?)\)/);
Devel::FindRef::ptr2ref $addr;
}
my $ref = [1, 2, 3];
my $str = "$ref";
my $ref_again = ref_again($str);
print Dumper($ref_again);
The stringified version contains the memory address of the array object, so yes, you can recover it. This code works for me, anyway (Cygwin, perl 5.8):
use Inline C;
#a = (1,2,3,8,12,17);
$a = \#a . "";
print "Stringified array ref is $a\n";
($addr) = $a =~ /0x(\w+)/;
$addr = hex($addr);
$c = recover_arrayref($addr);
#c = #$c;
print join ":", #c;
__END__
__C__
AV* recover_arrayref(int av_address) { return (AV*) av_address; }
.
$ perl ref-to-av.pl
Stringified array ref is ARRAY(0x67ead8)
1:2:3:8:12:17
I'm not sure why you want to do this, but if you really need it, ignore the answers that use the tricks to look into memory. They'll only cause you problems.
Why do you want to do this? There's probably a better design. Where are you getting that stringified reference from.
Let's say you need to do it for whatever reason. First, create a registry of objects where the hash key is the stringified form, and the value is a weakened reference:
use Scalar::Util qw(weaken);
my $array = [ ... ];
$registry{ $array } = $array;
weaken( $registry{ $array } ); # doesn't count toward ref count
Now, when you have the stringified form, you just look it up in the hash, checking to see that it's still a reference:
if( ref $registry{$string} ) { ... }
You could also try Tie::RefHash and let it handle all of the details of this.
There is a longer example of this in Intermediate Perl.
In case someone finds this useful, I'm extending tobyink's answer by adding support for detecting segmentation faults. There are two approaches I discovered. The first way locally replaces $SIG{SEGV} and $SIG{BUS} before dereferencing. The second way masks the child signal and checks if a forked child can dereference successfully. The first way is significantly faster than the second.
Anyone is welcome to improve this answer.
First Approach
sub unstringify_ref($) {
use bigint qw(hex);
use Devel::FindRef;
my $str = #_ ? shift : $_;
if (defined $str and $str =~ /\((0x[a-fA-F0-9]+)\)$/) {
my $addr = (hex $1)->bstr;
local $#;
return eval {
local $SIG{SEGV} = sub { die };
local $SIG{BUS} = sub { die };
return Devel::FindRef::ptr2ref $addr;
};
}
return undef;
}
I'm not sure if any other signals can occur in an attempt to access illegal memory.
Second Approach
sub unstringify_ref($) {
use bigint qw(hex);
use Devel::FindRef;
use Signal::Mask;
my $str = #_ ? shift : $_;
if (defined $str and $str =~ /\((0x[a-fA-F0-9]+)\)$/) {
my $addr = (hex $1)->bstr;
local $!;
local $?;
local $Signal::Mask{CHLD} = 1;
if (defined(my $kid = fork)) {
# Child -- This might seg fault on invalid address.
exit(not Devel::FindRef::ptr2ref $addr) unless $kid;
# Parent
waitpid $kid, 0;
return Devel::FindRef::ptr2ref $addr if $? == 0;
} else {
warn 'Unable to fork: $!';
}
}
return undef;
}
I'm not sure if the return value of waitpid needs to be checked.