I am currently trying to implement a suffix tree using Perl, however, when I attempt to set the reference for the tree function, the reference location is not set, if I pass the address via a string then check the text in the string vs the location of the hash table and they are different. Any help is appreciated!
use strict;
use warnings;
use Data::Dumper;
my $count = 0;
my $str; # holds the complete string
my %root;
# takes in all lines of code
open(IN, '<:encoding(UTF-8)', $ARGV[0]) or die "Could not open file '$ARGV[0]' $!\n";
while (<IN>) {
chomp;
# concatinates with string
$str .= $_;
}
# closes input
close(IN);
#length of input string
my $l_size = length($str) - 1;
#recursively makes
sub tree {
#recursive root
my %treeRoot;
#checking incomming data
print "1 ".Dumper(\#_)."\n";
#checking incomming data
print "2 ".Dumper(\%root)."\n";
#attempts to set tree's refrence
\%treeRoot, $count = #_;
#checking incomming data
print "3 ".Dumper(\%root)."\n";
#checking incomming data
print "4 ".$count."\n";
#leaf for each node
my %leaf;
for (my $i = 0; $i < $l_size; $i++) {
#creates alphabet tree
$treeRoot { substr($str, $i, 1) } = %leaf;
}
#checking incomming data
print "5 ".Dumper(\%root)."\n";
while ($count > 0) {
#checking incomming data
print "loop 6 ".Dumper(\%root)."\n";
$count--;
#checking incomming data
print "loop 7 ".$count."\n";
#recursion not implamented yet
#tree(\$treeRoot{'a'}, $count);
}
}
tree(\%root, 2);
#print Dumper(\%root);
You need parentheses to disambiguate. This:
\%treeRoot, $count = #_;
means this:
\%treeRoot;
$count = #_;
Because the assignment operator = has higher precedence than the comma operator ,. The warning that you got from running that code tells you this: Useless use of reference constructor in void context.
To pass the arguments correctly, you need parentheses:
(\%treeRoot, $count) = #_;
Unfortunately, this does not work, because you cannot assign to a reference this way. The following error tells you that: Can't modify reference constructor in list assignment.
So what you need is to pass the reference to a scalar:
my ($href, $count) = #_;
print $href->{'value'};
I think this method is a bit backwards, though. Passing variables by reference is likely to become a source of bugs. A more natural solution is to use the return value of the subroutine to assign values:
sub foo {
my %hash;
$hash{'value'} = ....
....
return \%hash;
}
my $hashref = foo();
print $hashref->{'value'};
Your question isn't actually how to pass a hash reference, but how to receive it, as the following will not work:
\%treeRoot, $count = #_;
Basically, you need to assign your reference to a scalar like so:
use strict;
use warnings;
sub example_sub {
my ($hashref, $count) = #_;
# Add two values to the hash:
$hashref->{newkey} = 'val';
$hashref->{newkey2} = 'val2';
}
my %root;
example_sub(\%root, 2);
use Data::Dump;
dd \%root;
Outputs:
{ newkey => "val", newkey2 => "val2" }
If you don't want to modify your original hash, you can assign the values to a new hash within the sub:
my %newhash = %$hashref;
For more info on working with references, check out: perlref - Perl references and nested data structures
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 months ago.
Improve this question
I have inherited an older script that uses hash references that I don't understand. It results in:
Can't use an undefined value as a HASH reference at
./make_quar_dbfile.pl line 65.
63 my $bucket = sprintf('%02x', $i);
64 my $file = sprintf('%s/%02x.db', $qdir, $i);
65 %{$hashes{$bucket}} ? 1 : next;
66 tie (my %hash, 'DB_File', $file, O_RDWR, 0600) || die "Can't open db file: $! \n ";
67 %hash = %{$hashes{$bucket}};
68 untie %hash;
The script reads through a number of gzipd emails to identify the sender/recip/subject/date etc., then writes that info to a DB_File hash.
This script used to work with older versions of Perl, but looks like it now is no longer compliant.
I'd really like to understand how this works, but I don't fully understand reference/dereference, why it's even necessary here, and the %{$var} notation. All of the references I've studied talk about hash references in terms of $hash_ref = \%author; not %hash_ref = %{$author}, for example.
Ideas on how to get this to work with hash references would be greatly appreciated.
#!/usr/bin/perl -w
use DB_File;
use File::Basename qw(basename);
use vars qw($verbose);
use strict;
use warnings;
sub DBG($);
$verbose = shift || 1;
my $qdir = '/var/spool/amavisd/qdb';
my $source_dir = '/var/spool/amavisd/quarantine';
my $uid = getpwnam('amavis');
my $gid = getgrnam('amavis');
my %hashes = ( );
my $me = basename($0);
my $version = '1.9';
my $steps = 100;
my $cnt = 0;
DBG("- Creating initial database files...");
for (my $i = 0; $i < 256; $i++) {
my $file = sprintf('%s/%02x.db', $qdir, $i);
unlink $file || DBG("Could not unlink $file to empty db: $! \n");
tie (my %hash, "DB_File", $file, O_CREAT, 0600) || die "Can't open db file: $! \n";
untie %hash;
chown($uid, $gid, $file) || die "Unable to set attributes on file: $! \n";
}
DBG("done\n");
opendir SOURCEDIR, $source_dir || die "Cannot open $source_dir: $! \n";
DBG("- Building hashes... ");
foreach my $f (sort readdir SOURCEDIR) {
next if ($f eq "." || $f eq "..");
if ($f =~ m/^(spam|virus)\-([^\-]+)\-([^\-]+)(\.gz)?/) {
my $type = $1;
my $key = $3;
my $bucket = substr($key, 0, 2);
my $d = $2;
my $subj = '';
my $to = '';
my $from = '';
my $size = '';
my $score = '0.0';
if (($cnt % $steps) == 0) { DBG(sprintf("\e[8D%-8d", $cnt)); } $cnt++;
if ($f =~ /\.gz$/ && open IN, "zcat $source_dir/$f |") {
while(<IN>) {
last if ($_ eq "\n");
$subj = $1 if (/^Subject:\s*(.*)$/);
$to = $1 if (/^To:\s*(.*)$/);
$from = $1 if (/^From:\s*(.*)$/);
$score = $1 if (/score=(\d{1,3}\.\d)/);
}
close IN;
$to =~ s/^.*\<(.*)\>.*$/$1/;
$from =~ s/^.*\<(.*)\>.*$/$1/;
$size = (stat("$source_dir/$f"))[7];
$hashes{$bucket}->{$f} = "$type\t$d\t$size\t$from\t$to\t$subj\t$score";
}
}
}
closedir SOURCEDIR;
DBG("...done\n\n- Populating database files...");
for (my $i = 0; $i < 256; $i++) {
my $bucket = sprintf('%02x', $i);
my $file = sprintf('%s/%02x.db', $qdir, $i);
%{$hashes{$bucket}} ? 1 : next;
tie (my %hash, 'DB_File', $file, O_RDWR, 0600) || die "Can't open db file: $! \n ";
%hash = %{$hashes{$bucket}};
untie %hash;
}
exit(0);
sub DBG($) { my $msg = shift; print $msg if ($verbose); }
What is $hash{$key}? A value associated with the (value of) $key, which must be a scalar. So we get the $value out of my %hash = ( $key => $value ).
That's a string, or a number. Or a filehandle. Or, an array reference, or a hash reference. (Or an object perhaps, normally a blessed hash reference.) They are all scalars, single-valued things, and as such are a legitimate value in a hash.
The syntax %{ ... } de-references a hash reference† so judged by %{ $hashes{$bucket} } that code expects there to be a hash reference. So the error says that there is actually nothing in %hashes for that value of a would-be key ($bucket), so it cannot "de-reference" it. There is either no key that is the value of $bucket at that point in the loop, or there is such a key but it has never been assigned anything.
So go debug it. Add printing statements through the loops so you can see what values are there and what they are, and which ones aren't even as they are assumed to be. Hard to tell what fails without running that program.
Then, the line %{$hashes{$bucket}} ? 1 : next; is a little silly. The condition of the ternary operator evaluates to a boolean, "true" (not undefined, not 0, not empty string '') or false. So it tests whether $hashes{$bucket} has a hashref with at least some keys, and if it does then it returns 1; so, the for loop continues. Otherwise it skips to the next iteration.
Well, then skip to next if there is not a (non-empty) hashref there:
next if not defined $hashes{$bucket} or not %{ $hashes{$bucket} };
Note how we first test whether there is such a key, and only then attempt to derefence it.
† Whatever expression may be inside the curlies must evaluate to a hash reference. (If it's else, like a number or a string, the code would still exit with an error but with a different one.)
So, in this code, the hash %hashes must have a key that is the value of $bucket at that point, and the value for that key must be a hash reference. Then, the ternary operator tests whether the hash obtained from that hash reference has any keys.
You need to understand references first, this is a kind of how-to :
#!/usr/bin/perl
use strict; use warnings;
use feature qw/say/;
use Data::Dumper;
my $var = {}; # I create a HASH ref explicitly
say "I created a HASH ref explicitly:";
say ref($var);
say "Now, let's add any type of content:";
say "Adding a ARRAY:";
push #{ $var->{arr} }, (0..5);
say Dumper $var;
say "Now, I add a new HASH";
$var->{new_hash} = {
foo => "value",
bar => "other"
};
say Dumper $var;
say 'To access the data in $var without Data::Dumper, we need to dereference what we want to retrieve';
say "to retrieve a HASH ref, we need to dereference with %:";
while (my ($key, $value) = each %{ $var->{new_hash} }) {
say "key=$key value=$value";
}
say "To retrieve the ARRAY ref:";
say join "\n", #{ $var->{arr} };
Output
I created a HASH ref explicitely:
HASH
Now, let's add any type of content:
Adding a ARRAY:
$VAR1 = {
'arr' => [
0,
1,
2,
3,
4,
5
]
};
Now, I add a new HASH
$VAR1 = {
'new_hash' => {
'foo' => 'value',
'bar' => 'other'
},
'arr' => [
0,
1,
2,
3,
4,
5
]
};
To access the data in $var without Data::Dumper, we need to dereference what we want to retrieve
to retrieve a HASH ref, we need to dereference with %:
key=foo value=value
key=bar value=other
To retrieve the ARRAY ref:
0
1
2
3
4
5
Now with your code, instead of
%{$hashes{$bucket}} ? 1 : next;
You should test the HASH ref first, because Perl say it's undefined, let's debug a bit:
use Data::Dumper;
print Dumper $hashes;
print "bucket=$bucket\n";
if (defined $hashes{$bucket}) {
print "Defined array\n";
}
else {
print "NOT defined array\n";
}
I am looking to parse a tab delimited text file into a nested hash with a subroutine. Each file row will be keyed by a unique id from a uid column(s), with the header row as nested keys. Which column(s) is(are) to become the uid changes (as sometimes there isn't a unique column, so the uid has to be a combination of columns). My issue is with the $uid variable, which I pass as a non-interpolated string. When I try to use it inside the subroutine in an interpolated way, it will only give me the non-interpolated value:
use strict;
use warnings;
my $lofrow = tablehash($lof_file, '$row{gene}', "transcript", "ENST");
##sub to generate table hash from file w/ headers
##input values are file, uid, header starter, row starter, max column number
##returns hash reference (deref it)
sub tablehash {
my ($file, $uid, $headstart, $rowstart, $colnum) = #_;
if (!$colnum){ # takes care of a unknown number of columns
$colnum = 0;
}
open(INA, $file) or die "failed to open $file, $!\n";
my %table; # permanent hash table
my %row; # hash of column values for each row
my #names = (); # column headers
my #values = (); # line/row values
while (chomp(my $line = <INA>)){ # reading lines for lof info
if ($line =~ /^$headstart/){
#names = split(/\t/, $line, $colnum);
} elsif ($line =~ /^$rowstart/){ # splitting lof info columns into variables
#values = split(/\t/, $line, $colnum);
#row{#names} = #values;
print qq($uid\t$row{gene}\n); # problem: prints "$row{gene} ACB1"
$table{"$uid"} = { %row }; # puts row hash into permanent hash, but with $row{gene} key)
}
}
close INA;
return \%table;
}
I am out of ideas. I could put $table{$row{$uid}} and simply pass "gene", but in a couple of instances I want to have a $uid of "$row{gene}|$row{rsid}" producing $table{ACB1|123456}
Interpolation is a feature of the Perl parser. When you write something like
"foo $bar baz"
, Perl compiles it into something like
'foo ' . $bar . ' $baz'
It does not interpret data at runtime.
What you have is a string where one of the characters happens to be $ but that has no special effect.
There are at least two possible ways to do something like what you want. One of them is to use a function, not a string. (Which makes sense because interpolation really means concatenation at runtime, and the way to pass code around is to wrap it in a function.)
my $lofrow = tablehash($lof_file, sub { my ($row) = #_; $row->{gene} }, "transcript", "ENST");
sub tablehash {
my ($file, $mkuid, $headstart, $rowstart, $colnum) = #_;
...
my $uid = $mkuid->(\%row);
$table{$uid} = { %row };
Here $mkuid isn't a string but a reference to a function that (given a hash reference) returns a uid string. tablehash calls it, passing a reference to %row to it. You can then later change it to e.g.
my $lofrow = tablehash($lof_file, sub { my ($row) = #_; "$row->{gene}|$row->{rsid}" }, "transcript", "ENST");
Another solution is to use what amounts to a template string:
my $lofrow = tablehash($lof_file, "gene|rsid", "transcript", "ENST");
sub tablehash {
my ($file, $uid_template, $headstart, $rowstart, $colnum) = #_;
...
(my $uid = $uid_template) =~ s/(\w+)/$row{$1}/g;
$table{$uid} = { %row };
The s/// code goes through the template string and manually replaces every word by the corresponding value from %row.
Random notes:
Bonus points for using strict and warnings.
if (!$colnum) { $colnum = 0; } can be simplified to $colnum ||= 0;.
Use lexical variables instead of bareword filehandles. Barewords are effectively global variables (and syntactically awkward because they're not first-class citizens of the language).
Always use the 3-argument form of open to avoid unexpected interpretation of the second argument.
Include the name of your program in error messages (either explicitly with $0 or implicitly by omitting \n from die).
my #foo = (); my %bar = (); is redundant and can be simplified to my #foo; my %bar;. Arrays and hashes start out empty; overwriting them with an empty list is pointless.
chomp(my $line = <INA>) will throw a warning when you reach EOF (because you're trying to chomp a variable containing undef).
my %row; should probably be declared inside the loop. It looks like it's supposed to only contain values from the current line.
Suggestion:
open my $fh, '<', $file or die "$0: can't open $file: $!\n";
while (my $line = readline $fh) {
chomp $line;
...
}
I am currently writing a perl script where I have a reference to an array (students) of references. After adding the hash references to the array. Now I add the references to the array of students and then ask the user how to sort them. This is where it gets confusing. I do not know how to deference the sorted array. Using dumper I can get the sorted array but in a unorganized output. How can I deference the array of hash references after sorting?
#!bin/usr/perl
use strict;
use warnings;
use Data::Dumper;
use 5.010;
#reference to a var $r = \$var; Deferencing $$r
#reference to an array $r = \#var ; Deferencing #$r
#referenc to a hash $r = \%var ; deferencing %$r
my $filename = $ARGV[0];
my $students = [];
open ( INPUT_FILE , '<', "$filename" ) or die "Could not open to read \n ";
sub readLines{
while(my $currentLine = <INPUT_FILE>){
chomp($currentLine);
my #myLine = split(/\s+/,$currentLine);
my %temphash = (
name => "$myLine[0]",
age => "$myLine[1]",
GPA => "$myLine[2]",
MA => "$myLine[3]"
);
pushToStudents(\%temphash);
}
}
sub pushToStudents{
my $data = shift;
push $students ,$data;
}
sub printData{
my $COMMAND = shift;
if($COMMAND eq "sort up"){
my #sortup = sort{ $a->{name} cmp $b->{name} } #$students;
print Dumper #sortup;
}elsif($COMMAND eq "sort down"){
my #sortdown = sort{ $b->{name} cmp $a->{name} } #$students;
print Dumper #sortdown;
//find a way to deference so to make a more organize user friendly read.
}else{
print "\n quit";
}
}
readLines();
#Output in random, the ordering of each users data is random
printf"please choose display order : ";
my $response = <STDIN>;
chomp $response;
printData($response);
The problem here is that you're expected Dumper to provide an organised output. It doesn't. It dumps a data structure to make debugging easier. The key problem will be that hashes are explicitly unordered data structures - they're key-value mappings, they don't produce any output order.
With reference to perldata:
Note that just because a hash is initialized in that order doesn't mean that it comes out in that order.
And specifically the keys function:
Hash entries are returned in an apparently random order. The actual random order is specific to a given hash; the exact same series of operations on two hashes may result in a different order for each hash.
There is a whole section in perlsec which explains this in more detail, but suffice to say - hashes are random order, which means whilst you're sorting your students by name, the key value pairs for each student isn't sorted.
I would suggest instead of:
my #sortdown = sort{ $b->{name} cmp $a->{name} } #$students;
print Dumper #sortdown;
You'd be better off with using a slice:
my #field_order = qw ( name age GPA MA );
foreach my $student ( sort { $b -> {name} cmp $a -> {name} } #$students ) {
print #{$student}{#field_order}, "\n";
}
Arrays (#field_order) are explicitly ordered, so you will always print your student fields in the same sequence. (Haven't fully tested for your example I'm afraid, because I don't have your source data, but this approach works with a sample data snippet).
If you do need to print the keys as well, then you may need a foreach loop instead:
foreach my $field ( #field_order ) {
print "$field => ", $student->{$field},"\n";
}
Or perhaps the more terse:
print "$_ => ", $student -> {$_},"\n" for #field_order;
I'm not sure I like that as much though, but that's perhaps a matter of taste.
The essence of your mistake is to assume that hashes will have a specific ordering. As #Sobrique explains, that assumption is wrong.
I assume you are trying to learn Perl, and therefore, some guidance on the basics will be useful:
#!bin/usr/perl
Your shebang line is wrong: On Windows, or if you run your script with perl script.pl, it will not matter, but you want to make sure the interpreter that is specified in that line uses an absolute path.
Also, you may not always want to use the perl interpreter that came with the system, in which case #!/usr/bin/env perl maybe helpful for one-off scripts.
use strict;
use warnings;
use Data::Dumper;
use 5.010;
I tend to prefer version constraints before pragmata (except in the case of utf8). Data::Dumper is a debugging aid, not something you use for human readable reports.
my $filename = $ARGV[0];
You should check if you were indeed given an argument on the command line as in:
#ARGV or die "Need filename\n";
my $filename = $ARGV[0];
open ( INPUT_FILE , '<', "$filename" ) or die "Could not open to read \n ";
File handles such as INPUT_FILE are called bareword filehandles. These have package scope. Instead, use lexical filehandles whose scope you can restrict to the smallest appropriate block.
There is no need to interpolate $filename in the third argument to open.
Always include the name of the file and the error message when dying from an error in open. Surrounding the filename with ' ' helps you identify any otherwise hard to detect characters that might be causing the problem (e.g. a newline or a space).
open my $input_fh, '<', $filename
or die "Could not open '$filename' for reading: $!";
sub readLines{
This is reading into an array you defined in global scope. What if you want to use the same subroutine to read records from two different files into two separate arrays? readLines should receive a filename as an argument, and return an arrayref as its output (see below).
while(my $currentLine = <INPUT_FILE>){
chomp($currentLine);
In most cases, you want all trailing whitespace removed, not just the line terminator.
my #myLine = split(/\s+/,$currentLine);
split on /\s+/ is different than split ' '. In most cases, the latter is infinitely more useful. Read about the differences in perldoc -f split.
my %temphash = (
name => "$myLine[0]",
age => "$myLine[1]",
GPA => "$myLine[2]",
MA => "$myLine[3]"
);
Again with the useless interpolation. There is no need to interpolate those values into fresh strings (except maybe in the case where they might be objects which overloaded the stringification, but, in this case, you know they are just plain strings.
pushToStudents(\%temphash);
No need for the extra pushToStudents subroutine in this case, unless this is a stub for a method that will later be able to load the data to a database or something. Even in that case, it be better to provide a callback to the function.
sub pushToStudents{
my $data = shift;
push $students ,$data;
}
You are pushing data to a global variable. A program where there can only ever be a single array of student records is not useful.
sub printData{
my $COMMAND = shift;
if($COMMAND eq "sort up"){
Don't do this. Every subroutine should have one clear purpose.
Here is a revised version of your program.
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
use Carp qw( croak );
run(\#ARGV);
sub run {
my $argv = $_[0];
#$argv
or die "Need name of student records file\n";
open my $input_fh, '<', $argv->[0]
or croak "Cannot open '$argv->[0]' for reading: $!";
print_records(
read_student_records($input_fh),
prompt_sort_order(),
);
return;
}
sub read_student_records {
my $fh = shift;
my #records;
while (my $line = <$fh>) {
last unless $line =~ /\S/;
my #fields = split ' ', $line;
push #records, {
name => $fields[0],
age => $fields[1],
gpa => $fields[2],
ma => $fields[3],
};
}
return \#records;
}
sub print_records {
my $records = shift;
my $sorter = shift;
if ($sorter) {
$records = [ sort $sorter #$records ];
}
say "#{ $_ }{ qw( age name gpa ma )}" for #$records;
return;
}
sub prompt_sort_order {
my #sorters = (
[ "Input order", undef ],
[ "by name in ascending order", sub { $a->{name} cmp $b->{name} } ],
[ "by name in descending order", sub { $b->{name} cmp $a->{name} } ],
[ "by GPA in ascending order", sub { $a->{gpa} <=> $b->{gpa} } ],
[ "by GPA in descending order", sub { $b->{gpa} <=> $a->{gpa} } ],
);
while (1) {
print "Please choose the order in which you want to print the records\n";
print "[ $_ ] $sorters[$_ - 1][0]\n" for 1 .. #sorters;
printf "\n\t(%s)\n", join('/', 1 .. #sorters);
my ($response) = (<STDIN> =~ /\A \s*? ([1-9][0-9]*?) \s+ \z/x);
if (
$response and
($response >= 1) and
($response <= #sorters)
) {
return $sorters[ $response - 1][1];
}
}
# should not be reached
return;
}
Is there any way to get Perl to convert the stringified version e.g (ARRAY(0x8152c28)) of an array reference to the actual array reference?
For example
perl -e 'use Data::Dumper; $a = [1,2,3];$b = $a; $a = $a.""; warn Dumper (Then some magic happens);'
would yield
$VAR1 = [
1,
2,
3
];
Yes, you can do this (even without Inline C). An example:
use strict;
use warnings;
# make a stringified reference
my $array_ref = [ qw/foo bar baz/ ];
my $stringified_ref = "$array_ref";
use B; # core module providing introspection facilities
# extract the hex address
my ($addr) = $stringified_ref =~ /.*(0x\w+)/;
# fake up a B object of the correct class for this type of reference
# and convert it back to a real reference
my $real_ref = bless(\(0+hex $addr), "B::AV")->object_2svref;
print join(",", #$real_ref), "\n";
but don't do that. If your actual object is freed or reused, you may very well
end up getting segfaults.
Whatever you are actually trying to achieve, there is certainly a better way.
A comment to another answer reveals that the stringification is due to using a reference as a hash key. As responded to there, the better way to do that is the well-battle-tested
Tie::RefHash.
The first question is: do you really want to do this?
Where is that string coming from?
If it's coming from outside your Perl program, the pointer value (the hex digits) are going to be meaningless, and there's no way to do it.
If it's coming from inside your program, then there's no need to stringify it in the first place.
Yes, it's possible: use Devel::FindRef.
use strict;
use warnings;
use Data::Dumper;
use Devel::FindRef;
sub ref_again {
my $str = #_ ? shift : $_;
my ($addr) = map hex, ($str =~ /\((.+?)\)/);
Devel::FindRef::ptr2ref $addr;
}
my $ref = [1, 2, 3];
my $str = "$ref";
my $ref_again = ref_again($str);
print Dumper($ref_again);
The stringified version contains the memory address of the array object, so yes, you can recover it. This code works for me, anyway (Cygwin, perl 5.8):
use Inline C;
#a = (1,2,3,8,12,17);
$a = \#a . "";
print "Stringified array ref is $a\n";
($addr) = $a =~ /0x(\w+)/;
$addr = hex($addr);
$c = recover_arrayref($addr);
#c = #$c;
print join ":", #c;
__END__
__C__
AV* recover_arrayref(int av_address) { return (AV*) av_address; }
.
$ perl ref-to-av.pl
Stringified array ref is ARRAY(0x67ead8)
1:2:3:8:12:17
I'm not sure why you want to do this, but if you really need it, ignore the answers that use the tricks to look into memory. They'll only cause you problems.
Why do you want to do this? There's probably a better design. Where are you getting that stringified reference from.
Let's say you need to do it for whatever reason. First, create a registry of objects where the hash key is the stringified form, and the value is a weakened reference:
use Scalar::Util qw(weaken);
my $array = [ ... ];
$registry{ $array } = $array;
weaken( $registry{ $array } ); # doesn't count toward ref count
Now, when you have the stringified form, you just look it up in the hash, checking to see that it's still a reference:
if( ref $registry{$string} ) { ... }
You could also try Tie::RefHash and let it handle all of the details of this.
There is a longer example of this in Intermediate Perl.
In case someone finds this useful, I'm extending tobyink's answer by adding support for detecting segmentation faults. There are two approaches I discovered. The first way locally replaces $SIG{SEGV} and $SIG{BUS} before dereferencing. The second way masks the child signal and checks if a forked child can dereference successfully. The first way is significantly faster than the second.
Anyone is welcome to improve this answer.
First Approach
sub unstringify_ref($) {
use bigint qw(hex);
use Devel::FindRef;
my $str = #_ ? shift : $_;
if (defined $str and $str =~ /\((0x[a-fA-F0-9]+)\)$/) {
my $addr = (hex $1)->bstr;
local $#;
return eval {
local $SIG{SEGV} = sub { die };
local $SIG{BUS} = sub { die };
return Devel::FindRef::ptr2ref $addr;
};
}
return undef;
}
I'm not sure if any other signals can occur in an attempt to access illegal memory.
Second Approach
sub unstringify_ref($) {
use bigint qw(hex);
use Devel::FindRef;
use Signal::Mask;
my $str = #_ ? shift : $_;
if (defined $str and $str =~ /\((0x[a-fA-F0-9]+)\)$/) {
my $addr = (hex $1)->bstr;
local $!;
local $?;
local $Signal::Mask{CHLD} = 1;
if (defined(my $kid = fork)) {
# Child -- This might seg fault on invalid address.
exit(not Devel::FindRef::ptr2ref $addr) unless $kid;
# Parent
waitpid $kid, 0;
return Devel::FindRef::ptr2ref $addr if $? == 0;
} else {
warn 'Unable to fork: $!';
}
}
return undef;
}
I'm not sure if the return value of waitpid needs to be checked.
Is there any way to get Perl to convert the stringified version e.g (ARRAY(0x8152c28)) of an array reference to the actual array reference?
For example
perl -e 'use Data::Dumper; $a = [1,2,3];$b = $a; $a = $a.""; warn Dumper (Then some magic happens);'
would yield
$VAR1 = [
1,
2,
3
];
Yes, you can do this (even without Inline C). An example:
use strict;
use warnings;
# make a stringified reference
my $array_ref = [ qw/foo bar baz/ ];
my $stringified_ref = "$array_ref";
use B; # core module providing introspection facilities
# extract the hex address
my ($addr) = $stringified_ref =~ /.*(0x\w+)/;
# fake up a B object of the correct class for this type of reference
# and convert it back to a real reference
my $real_ref = bless(\(0+hex $addr), "B::AV")->object_2svref;
print join(",", #$real_ref), "\n";
but don't do that. If your actual object is freed or reused, you may very well
end up getting segfaults.
Whatever you are actually trying to achieve, there is certainly a better way.
A comment to another answer reveals that the stringification is due to using a reference as a hash key. As responded to there, the better way to do that is the well-battle-tested
Tie::RefHash.
The first question is: do you really want to do this?
Where is that string coming from?
If it's coming from outside your Perl program, the pointer value (the hex digits) are going to be meaningless, and there's no way to do it.
If it's coming from inside your program, then there's no need to stringify it in the first place.
Yes, it's possible: use Devel::FindRef.
use strict;
use warnings;
use Data::Dumper;
use Devel::FindRef;
sub ref_again {
my $str = #_ ? shift : $_;
my ($addr) = map hex, ($str =~ /\((.+?)\)/);
Devel::FindRef::ptr2ref $addr;
}
my $ref = [1, 2, 3];
my $str = "$ref";
my $ref_again = ref_again($str);
print Dumper($ref_again);
The stringified version contains the memory address of the array object, so yes, you can recover it. This code works for me, anyway (Cygwin, perl 5.8):
use Inline C;
#a = (1,2,3,8,12,17);
$a = \#a . "";
print "Stringified array ref is $a\n";
($addr) = $a =~ /0x(\w+)/;
$addr = hex($addr);
$c = recover_arrayref($addr);
#c = #$c;
print join ":", #c;
__END__
__C__
AV* recover_arrayref(int av_address) { return (AV*) av_address; }
.
$ perl ref-to-av.pl
Stringified array ref is ARRAY(0x67ead8)
1:2:3:8:12:17
I'm not sure why you want to do this, but if you really need it, ignore the answers that use the tricks to look into memory. They'll only cause you problems.
Why do you want to do this? There's probably a better design. Where are you getting that stringified reference from.
Let's say you need to do it for whatever reason. First, create a registry of objects where the hash key is the stringified form, and the value is a weakened reference:
use Scalar::Util qw(weaken);
my $array = [ ... ];
$registry{ $array } = $array;
weaken( $registry{ $array } ); # doesn't count toward ref count
Now, when you have the stringified form, you just look it up in the hash, checking to see that it's still a reference:
if( ref $registry{$string} ) { ... }
You could also try Tie::RefHash and let it handle all of the details of this.
There is a longer example of this in Intermediate Perl.
In case someone finds this useful, I'm extending tobyink's answer by adding support for detecting segmentation faults. There are two approaches I discovered. The first way locally replaces $SIG{SEGV} and $SIG{BUS} before dereferencing. The second way masks the child signal and checks if a forked child can dereference successfully. The first way is significantly faster than the second.
Anyone is welcome to improve this answer.
First Approach
sub unstringify_ref($) {
use bigint qw(hex);
use Devel::FindRef;
my $str = #_ ? shift : $_;
if (defined $str and $str =~ /\((0x[a-fA-F0-9]+)\)$/) {
my $addr = (hex $1)->bstr;
local $#;
return eval {
local $SIG{SEGV} = sub { die };
local $SIG{BUS} = sub { die };
return Devel::FindRef::ptr2ref $addr;
};
}
return undef;
}
I'm not sure if any other signals can occur in an attempt to access illegal memory.
Second Approach
sub unstringify_ref($) {
use bigint qw(hex);
use Devel::FindRef;
use Signal::Mask;
my $str = #_ ? shift : $_;
if (defined $str and $str =~ /\((0x[a-fA-F0-9]+)\)$/) {
my $addr = (hex $1)->bstr;
local $!;
local $?;
local $Signal::Mask{CHLD} = 1;
if (defined(my $kid = fork)) {
# Child -- This might seg fault on invalid address.
exit(not Devel::FindRef::ptr2ref $addr) unless $kid;
# Parent
waitpid $kid, 0;
return Devel::FindRef::ptr2ref $addr if $? == 0;
} else {
warn 'Unable to fork: $!';
}
}
return undef;
}
I'm not sure if the return value of waitpid needs to be checked.