How to print the array of arrays in perl - reference variable - perl

#! /usr/bin/perl
use strict;
use warnings;
use File::stat;
my $file_name = 0;
my $info = 0;
my $ret_mode = 0;
my $size;
my $last_mod;
my #array_file;
my $index = 0;
my #array_mode;
my #array_size;
my #array_last_mod;
foreach(#ARGV){
$file_name = $_;
$info = stat($file_name);
$size = $info->size;
$last_mod = scalar(localtime($info->mtime));
$ret_mode = $info->mode;
$ret_mode = $ret_mode & 0777;
$array_file[$index] = ($file_name);
$array_mode[$index] = ($ret_mode);
$array_size[$index] = ($size);
$array_last_mod[$index] = ($last_mod);
$ret_mode = 0;
$index++;
}
my #array_arrays = (#array_file, #array_mode, #array_size, #array_last_mod);
my $array_ref = \#array_arrays;
my $i = 0;
for(#$array_ref){
print "#$array_ref[$i]\n";
$i++;
}
I have created an array of arrays and I want to print the filename,mmode,size and last access time from the array of arrays created. Its not printing any values with,
for(#$array_ref){
print "#$array_ref[$i]\n";
$i++;
}

my #array_arrays = (#array_file, #array_mode, #array_size, #array_last_mod);
This statement does not create an array of arrays. Instead, It flattens the various arrays into one big flat list and then assigns that to #array_arrays. You want to assign array references. Obtain those with the reference operator \:
my #array_arrays = (\#array_file, \#array_mode, \#array_size, \#array_last_mod);
or with the shortcut
my #array_arrays = \(#array_file, #array_mode, #array_size, #array_last_mod);
Even then, your last foreach-loop is wrong. You probably meant
for my $i (0 .. $#{ $array_arrays[0] }) {
for my $aref (#array_arrays) {
print $aref->[$i], "\n";
}
}
or something similar.
Your code style could be improved.
Please don't declare all your variables at the top. Declare them in the tightest scope possible. Try to declare them at the point of initialization, e.g.
for my $file_name (#ARGV) {
my $info = stat($file_name);
my size = $info->size;
...
}
Don't prefix your array names with array_. The # sigil and/or subscripting with the [...] operator makes it clear that these are arrays.
$ret_mode & 0777 – The result should be $ret_mode itself: 0777 is 0b111111111. I.e. this removes all but the last 9 bits – you wouldn't care if there were more to the left.
$last_mod = scalar(localtime($info->mtime)); – due to the scalar assignment, localtime is already executed in scalar context. No need to make this explicit.
my $index = 0; ... for (...) { $array[$index] = ...; $index++ }. Please not. Just use push: for (...) { push #array, ... }. Don't maintain indices yourself unless you have to.
$ret_mode = 0; Why? you assign a new value during the next iteration anyway. Note that you should declare this variables inside the loop (see my point about tight scopes), which would create a new variable in each iteration, making this even more useless.
my $array_ref = \#array_arrays; .. #$array_ref[$i]. Isn't this a bit backwards? $array_arrays[$i] would work just as well. Note that in your defeferencing, # is probably the wrong sigil. You meant $$array_ref[$i].

Let's try a little something different.
First off, there's a nice syntax using the -> for referenced arrays and hashes. Here I am going to make a array of people. I'll make a hash of %person that contains all that person's information:
my %person;
my $person{NAME} = "Bob";
my $person{JOB} = "Programmer";
my $person{PHONE} = "555-1234";
Now, I'll put it into an array:
my #array
my $array[0] = \%person;
I could reference the person in the array this way:
print ${$array[0]}{NAME} . "\n"; #Prints Bob
print ${$array[0]}{JOB} . "\n"; #Prints Porgrammer
But, Perl gives me a nice clean way to do this:
print $array[0]->{NAME} . "\n"; #Prints Bob
print $array[0]->{JOB} . "\n"; #Prints Progammer
In fact, I could skip the hash all together. Here I am adding Jill to my array:
$array[1]->{NAME} = "Jill";
$array[1]->{JOB} = "DBA";
$array[1]->{PHONE} = "555-5555";
You can see this is a much simpler way to use references. It's easier to see what is going on and takes fewer lines of code.
You can refer to an array of an array like this:
$myarray[1]->[3] = 42;
Or have a hash which stores an array. In this day and age, who has only a single phone number?:
$person[1]->{PHONE}->[0] = "555-4567";
$person[1]->{PHONE}->[1] = "555-4444";
Or, to make it even more complex we could have a hash of a hash of an array:
$person[1]->{PHONE}->{CELL}->[0] = "555-1111";
$person[1]->{PHONE}->{CELL}->[1] = "555-2222";
$person[1]->{PHONE}->{HOME}->[0] = "555-3333";
$person[1]->{PHONE}->{JOB}->[0] = "555-4444";
$person[1]->{PHONE}->{JOB}->[1] = "555-5555";
Using this syntax will really help clean up a lot of your code. You won't have to store the information into individual structures that are then only used to make references. Instead, you can simple setup your structure the way you want without intermediary steps.
Now to your problem: You're trying to store a bunch of information about files into a series of arrays. What you're hoping is that $array_mode[1] goes with $array_file[1] and you have to keep all of these arrays in sync. This is a pain and it is complex.
The entire purpose of using references is to eliminate this need for multiple variables. If you're going to use references, why not simply store your entire file structure into a single array.
What you really, really want is an array of hash references. And, that hash reference will be keyed based upon your file attributes. Here is your code restructured into using an array of hash references. I didn't even bother to check the rest of it. For example, I'm not sure how your localtime thing will work:
use strict;
use warnings;
use feature qw(say);
use File::stat;
my #files;
for my $file ( #ARGV ) {
my $info = stat( $file );
my $file = {}; #This will be a reference to a hash
$file->{NAME} = $file;
$file->{SIZE} = $info->size;
$file->{RET_MODE} = $info->mode & 0777;
$file->{LAST_MOD} = localtime $info->mtime; #Does this work?
push #files, $file #Pushes the hash reference onto the array
}
That's way shorter and cleaner. Plus, you know that $files[0]->{NAME} goes with $files[1]->{SIZE}, and if you remove $files[0] from your array, or transfer it to another variable, all of the attributes of that file go together.
Here's how you'd print it out:
for my $file ( #files ) {
say "File Name: " . $file->{NAME};
say "File Size: " . $file->{SIZE};
say "Last Modified: " . $file->{LAST_MOD};
say "File Mode: " . $file->{RET_MODE};
}
Simple and easy to do.
However, I would argue what you really want is a hash of hashes. Let your file name be the key to your main hash, and let {SIZE}, {LAST_MOD}, and {RET_MODE} be the keys to your sub hash:
my %files = {}; #This is a hash of hashes
for my $file_name ( #ARGV ) {
my $info = stat( $file );
$files{$file_name}->{SIZE} = $info->size;
$files{$file_name}->{RET_MODE} = $info->mode & 0777;
$files{$file_name}->{LAST_MOD} = localtime $info->mtime; #Does this work?
}
Now if someone asks, "When was foo.txt last modified?", you can say:
say "File 'foo.txt' was last modified on " . $file{foo.txt}->{LAST_MOD};
And to print out your entire structure:
for my $file_name ( sort keys %files ) {
say "File: $file_name";
for my attribute ( sort keys %{ $file_name } ) {
say " $attribute: " . $files{$file_name}->{$attribute};
}
}
Next step is to learn about Object Oriented Perl! Object Oriented Perl uses these types of references, but will greatly simplify the handling of these references, so you make fewer programming mistakes.

Related

Perl,Move file handle in main and subroutine

The propose of the script to grep some value from some data table in the ASCII files.
I modified the
script which I posted yesterday.
Now it barely works. I wonder if it is the proper way to move a file handle in this way.
The usage is still the same
myscript.pl targetfolder/*> result.csv
F is my file handle.
The argument I passed to the subroutine is the scalar $_, which is used by the if condition. When I want to move downward in my subroutine next if 1..4 will not work, so I repeat $a = <F>; a few times to achieve moving file handle downward.
But I think this is not a proper way to move the same file handle both in my main code and my subroutine. I am not sure it will really go through every line. I need your advice.
myscript.pl
#Report strip
use warnings;
use strict;
##Print the title
Tfms2();
##Print the title
print "\n";
#ff = <#ARGV>;
foreach $ff ( #ff ) {
open (F, $ff);
#fswf = #fschuck = #fsxpos = #fsypos = #fsdev = #csnom = "";
#cswf = #cschuck = #csxpos = #csypos = #csnom = ""; # is there an efficient way?
while (<F>) {
Mfms2();
Mfms3();
}
$r = 1;
while ( $r <= $#fswf ) { # because #fsws is the largest array
Cfms3();
print "\n";
$r++;
}
close (F);
}
##==========================================================================================
##Subs
##==========================================================================================
##FS II
sub Tfms2 {
print "FS_Wafer,FS_ChuckID,FS_pos_X,FS_pos_Y,FS_deviation,CS_Wafer,CS_ChuckID,CS_pos_X,CS_pos_Y,CS_NofWafer_Ident_Spot";
}
sub Mfms2 {
if ( /^F\sM\sSTATISTICS\sII$/ ) {
$a = (<F>);
$a = (<F>);
$a = (<F>);
$a = (<F>);
$a = (<F>);
$a = (<F>);
$a = (<F>);
$a = (<F>);
$r = 1;
#b = "";
while ( $a !~ /\+\-/ ) {
chomp $a;
#b = split / *\| */, $a;
$fswf[$r] = $b[1];
$fschuck[$r] = $b[2];
$fsxpos[$r] = $b[3];
$fsypos[$r] = $b[4];
$fsdev[$r] = $b[5];
$r++;
$a = (<F>);
#b = "";
}
}
}
##FS III
sub Mfms3 {
if ( /^F\sM\sSTATISTICS\sIII$/ ) {
$a = (<F>);
$a = (<F>);
$a = (<F>);
$a = (<F>);
$a = (<F>);
$a = (<F>);
$a = (<F>);
$a = (<F>);
$r = 1;
#b = "";
while ( $a !~ /\+\-/ ) {
chomp $a;
#b = split / *\| */, $a;
$cswf[$r] = $b[1];
$cschuck[$r] = $b[2];
$csxpos[$r] = $b[3];
$csypos[$r] = $b[4];
$csnom[$r] = $b[5];
$r++;
$a = (<F>);
#b = "";
}
}
}
sub Cfms3 {
print "$fswf[$r],$fschuck[$r],$fsxpos[$r],$fsypos[$r],$fsdev[$r],";
print "$cswf[$r],$cschuck[$r],$csxpos[$r],$csypos[$r],$csnom[$r],";
}
You have forgotten to tell us what the program is supposed to do, so it's very hard to be any help. There might be more information in your previous question, but we don't all read every question here and you don't even include a link to your previous question.
The answer to your question is that you can use seek() to move to an arbitrary position in a file. You might also find it useful to look at tell() which can tell you where you currently are in a file.
But I don't think that information will be particularly helpful to you as you seem rather confused about what you're trying to do. If you were to explain your task in a bit more detail, then I strongly suspect that we could help you (and I also suspect that help would largely involve rewriting your code from scratch). But until you give us details, all we can do is to point out some of the more obvious problems with your code.
You should always include use strict and use warnings in your code.
I don't think that #ff = <#ARGV> does what you think it does. I think you want #ff = #ARGV instead. And, even then, that feels like you're copying #ARGV pointlessly.
Having an array (#ff) and a scalar ($ff) with the same name is going to confuse someone at some point. And, more generally, you need to put more effort into naming your variables clearly.
Please use the three-argument version of open() along with lexical filehandles. You should also always check the return code from open() - open my $fh, "<", $ff or die "Could not open $ff: $!".
Your lines like #fswf=#fschuck=#fsxpos=#fsypos=#fsdev=#csnom="" aren't doing what you think they are doing. You end up with a lot of arrays each of which contain a single element. Better to just declare the arrays with my - Perl will create them empty (my (#fswf, #fschuck, #fsxpos, ...)).
If you really need to skip eight lines when reading from your file then it's much clearer to write: $a = <F> for 1 .. 8.
You are making heavy use of global variables. There's a good reason why this goes against software engineering best practices. It makes your code far more fragile than it needs to be.
All in all, you seem to be guessing at a solution here, and that's never a good approach. As I said above, we'd like to help you, but without a lot more information that's going to be almost impossible.
Update: Looking at your code a bit more closely, I see that you are storing the data that you parse from the files in a number of arrays. Each array contains data from a single column of the input file and in order to get all of the data from a single row, you need to access each of these arrays using the same index value. This isn't a very good idea. Splitting linked data across different variables is a recipe for disaster. A far better idea would be to store each record in a hash (where the key would denote the data item that is stored) and to store references to all of these hashes in an array. This brings all of your data together in a single variable.
Updated update: I don't know enough about your data to be sure, but here's the kind of approach I would take. I've only parsed the data and then used Data::Dumper to display the parsed data structure. Producing better output is left as an exercise for the reader :-)
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
#ARGV or die "Usage: $0 file [file...]\n";
# Define two list of keys for the different types
my %cols = (
II => [qw(wf chuck xpos ypos dev)],
III => [qw(wf chuck xpos ypos nom)],
);
my #data; # To store the parsed data
my $cur_type; # Keep track of the current type of record
while (<>) {
# Look for a record header and determine which type of
# record we're dealing with (I think 'II' or 'III').
if (/^F M STATISTICS (III?)$/) {
$cur_type = $1;
next;
}
# Skip lines that are just headers/footers
next if /\+[-=]/;
# Skip lines that don't include data
next unless /\d/;
chomp;
# Remove the start and end of the line
s/^\|\s+//;
s/\s+\|$//;
# Store the data in a hash with the correct keys
# for this type of record
my %rec;
#rec{#{$cols{$cur_type}}} = split /\s+\|\s+/;
# Store a reference to our hash in #data
push #data, \%rec;
}
# Dump the contents of #data
say Dumper \#data;

How to copy a nested hash

How to copy a multi level nested hash(say, %A) to another hash(say, %B)? I want to make sure that the new hash does not contain same references(pointers) as the original hash(%A).
If I change anything in the original hash (%A), it should not change
anything in the new hash(%B).
I want a generic way do it. I know I can do it by reassigning by value
for each level of keys(like, %{ $b{kb} } = %a;).
But, there should be a solution which would work irrespective of the number of key levels(hash of hash of hash of .... hash of hash)
PROBLEM EXAMPLE
use Data::Dumper;
my %a=(q=>{
q1=>1,
q2=>2,
},
w=>2);
my %b;
my %c;
%{ $b{kb} } = %a;
print "\%b=[".Data::Dumper::Dumper (%b)."] ";
%{ $c{kc} } = %a; # $b{kb} = \%a;
print "\n\%c=[".Data::Dumper::Dumper (%c)."] ";
# CHANGE THE VALUE OF KEY IN ORIGINAL HASH %a
$a{q}{q1} = 2; # $c{kc} = \%a;
print "\n\%b=[".Data::Dumper::Dumper (%b)."] ";
print "\n\%c=[".Data::Dumper::Dumper (%c)."] ";
Appreciate your help
What you want is commonly known as a "deep copy", where as the assignment operator does a "shallow copy".
use Storable qw( dclone );
my $copy = dclone($src);

Is it possible to convert a stringified reference from a SCALAR back to a REF? [duplicate]

Is there any way to get Perl to convert the stringified version e.g (ARRAY(0x8152c28)) of an array reference to the actual array reference?
For example
perl -e 'use Data::Dumper; $a = [1,2,3];$b = $a; $a = $a.""; warn Dumper (Then some magic happens);'
would yield
$VAR1 = [
1,
2,
3
];
Yes, you can do this (even without Inline C). An example:
use strict;
use warnings;
# make a stringified reference
my $array_ref = [ qw/foo bar baz/ ];
my $stringified_ref = "$array_ref";
use B; # core module providing introspection facilities
# extract the hex address
my ($addr) = $stringified_ref =~ /.*(0x\w+)/;
# fake up a B object of the correct class for this type of reference
# and convert it back to a real reference
my $real_ref = bless(\(0+hex $addr), "B::AV")->object_2svref;
print join(",", #$real_ref), "\n";
but don't do that. If your actual object is freed or reused, you may very well
end up getting segfaults.
Whatever you are actually trying to achieve, there is certainly a better way.
A comment to another answer reveals that the stringification is due to using a reference as a hash key. As responded to there, the better way to do that is the well-battle-tested
Tie::RefHash.
The first question is: do you really want to do this?
Where is that string coming from?
If it's coming from outside your Perl program, the pointer value (the hex digits) are going to be meaningless, and there's no way to do it.
If it's coming from inside your program, then there's no need to stringify it in the first place.
Yes, it's possible: use Devel::FindRef.
use strict;
use warnings;
use Data::Dumper;
use Devel::FindRef;
sub ref_again {
my $str = #_ ? shift : $_;
my ($addr) = map hex, ($str =~ /\((.+?)\)/);
Devel::FindRef::ptr2ref $addr;
}
my $ref = [1, 2, 3];
my $str = "$ref";
my $ref_again = ref_again($str);
print Dumper($ref_again);
The stringified version contains the memory address of the array object, so yes, you can recover it. This code works for me, anyway (Cygwin, perl 5.8):
use Inline C;
#a = (1,2,3,8,12,17);
$a = \#a . "";
print "Stringified array ref is $a\n";
($addr) = $a =~ /0x(\w+)/;
$addr = hex($addr);
$c = recover_arrayref($addr);
#c = #$c;
print join ":", #c;
__END__
__C__
AV* recover_arrayref(int av_address) { return (AV*) av_address; }
.
$ perl ref-to-av.pl
Stringified array ref is ARRAY(0x67ead8)
1:2:3:8:12:17
I'm not sure why you want to do this, but if you really need it, ignore the answers that use the tricks to look into memory. They'll only cause you problems.
Why do you want to do this? There's probably a better design. Where are you getting that stringified reference from.
Let's say you need to do it for whatever reason. First, create a registry of objects where the hash key is the stringified form, and the value is a weakened reference:
use Scalar::Util qw(weaken);
my $array = [ ... ];
$registry{ $array } = $array;
weaken( $registry{ $array } ); # doesn't count toward ref count
Now, when you have the stringified form, you just look it up in the hash, checking to see that it's still a reference:
if( ref $registry{$string} ) { ... }
You could also try Tie::RefHash and let it handle all of the details of this.
There is a longer example of this in Intermediate Perl.
In case someone finds this useful, I'm extending tobyink's answer by adding support for detecting segmentation faults. There are two approaches I discovered. The first way locally replaces $SIG{SEGV} and $SIG{BUS} before dereferencing. The second way masks the child signal and checks if a forked child can dereference successfully. The first way is significantly faster than the second.
Anyone is welcome to improve this answer.
First Approach
sub unstringify_ref($) {
use bigint qw(hex);
use Devel::FindRef;
my $str = #_ ? shift : $_;
if (defined $str and $str =~ /\((0x[a-fA-F0-9]+)\)$/) {
my $addr = (hex $1)->bstr;
local $#;
return eval {
local $SIG{SEGV} = sub { die };
local $SIG{BUS} = sub { die };
return Devel::FindRef::ptr2ref $addr;
};
}
return undef;
}
I'm not sure if any other signals can occur in an attempt to access illegal memory.
Second Approach
sub unstringify_ref($) {
use bigint qw(hex);
use Devel::FindRef;
use Signal::Mask;
my $str = #_ ? shift : $_;
if (defined $str and $str =~ /\((0x[a-fA-F0-9]+)\)$/) {
my $addr = (hex $1)->bstr;
local $!;
local $?;
local $Signal::Mask{CHLD} = 1;
if (defined(my $kid = fork)) {
# Child -- This might seg fault on invalid address.
exit(not Devel::FindRef::ptr2ref $addr) unless $kid;
# Parent
waitpid $kid, 0;
return Devel::FindRef::ptr2ref $addr if $? == 0;
} else {
warn 'Unable to fork: $!';
}
}
return undef;
}
I'm not sure if the return value of waitpid needs to be checked.

three questions on a Perl function

I am trying to use an existing Perl program, which includes the following function of GetItems. The way to call this function is listed in the following.
I have several questions for this program:
what does foreach my $ref (#_) aim to do? I think #_ should be related to the parameters passed, but not quite sure.
In my #items = sort { $a <=> $b } keys %items; the "items" on the left side should be different from the "items" on the right side? Why do they use the same name?
What does $items{$items[$i]} = $i + 1; aim to do? Looks like it just sets up the value for the hash $items sequentially.
$items = GetItems($classes, $pVectors, $nVectors, $uVectors);
######################################
sub GetItems
######################################
{
my $classes = shift;
my %items = ();
foreach my $ref (#_)
{
foreach my $id (keys %$ref)
{
foreach my $cui (keys %{$ref->{$id}}) { $items{$cui} = 1 }
}
}
my #items = sort { $a <=> $b } keys %items;
open(VAL, "> $classes.items");
for my $i (0 .. $#items)
{
print VAL "$items[$i]\n";
$items{$items[$i]} = $i + 1;
}
close VAL;
return \%items;
}
When you enter a function, #_ starts out as an array of (aliases to) all the parameters passed into the function; but the my $classes = shift removes the first element of #_ and stores it in the variable $classes, so the foreach my $ref (#_) iterates over all the remaining parameters, storing (aliases to) them one at a time in $ref.
Scalars, hashes, and arrays are all distinguished by the syntax, so they're allowed to have the same name. You can have a $foo, a #foo, and a %foo all at the same time, and they don't have to have any relationship to each other. (This, together with the fact that $foo[0] refers to #foo and $foo{'a'} refers to %foo, causes a lot of confusion for newcomers to the language; you're not alone.)
Exactly. It sets each element of %items to a distinct integer ranging from one to the number of elements, proceeding in numeric (!) order by key.
foreach my $ref (#_) loops through each hash reference passed as a parameter to GetItems. If the call looks like this:
$items = GetItems($classes, $pVectors, $nVectors, $uVectors);
then the loop processes the hash refs in $pVector, $nVectors, and $uVectors.
#items and %items are COMPLETELY DIFFERENT VARIABLES!! #items is an array variable and %items is a hash variable.
$items{$items[$i]} = $i + 1 does exactly as you say. It sets the value of the %items hash whose key is $items[$i] to $i+1.
Here is an (nearly) line by line description of what is happening in the subroutine
Define a sub named GetItems.
sub GetItems {
Store the first value in the default array #_, and remove it from the array.
my $classes = shift;
Create a new hash named %items.
my %items;
Loop over the remaining values given to the subroutine, setting $ref to the value on each iteration.
for my $ref (#_){
This code assumes that the previous line set $ref to a hash ref. It loops over the unsorted keys of the hash referenced by $ref, storing the key in $id.
for my $id (keys %$ref){
Using the key ($id) given by the previous line, loop over the keys of the hash ref at that position in $ref. While also setting the value of $cui.
for my $cui (keys %{$ref->{$id}}) {
Set the value of %item at position $cui, to 1.
$items{$cui} = 1;
End of the loops on the previous lines.
}
}
}
Store a sorted list of the keys of %items in #items according to numeric value.
my #items = sort { $a <=> $b } keys %items;
Open the file named by $classes with .items appended to it. This uses the old-style two arg form of open. It also ignores the return value of open, so it continues on to the next line even on error. It stores the file handle in the global *VAL{IO}.
open(VAL, "> $classes.items");
Loop over a list of indexes of #items.
for my $i (0 .. $#items){
Print the value at that index on it's own line to *VAL{IO}.
print VAL "$items[$i]\n";
Using that same value as an index into %items (which it is a key of) to the index plus one.
$items{$items[$i]} = $i + 1;
End of loop.
}
Close the file handle *VAL{IO}.
close VAL;
Return a reference to the hash %items.
return \%items;
End of subroutine.
}
I have several questions for this program:
What does foreach my $ref (#_) aim to do? I think #_ should be related to the parameters passed, but not quite sure.
Yes, you are correct. When you pass parameters into a subroutine, they automatically are placed in the #_ array. (Called a list in Perl). The foreach my $ref (#_) begins a loop. This loop will be repeated for each item in the #_ array, and each time, the value of $ref will be assigned the next item in the array. See Perldoc's Perlsyn (Perl Syntax) section about for loops and foreach loops. Also look at Perldoc's Perlvar (Perl Variables) section of General variables for information about special variables like #_.
Now, the line my $classes = shift; is removing the first item in the #_ list and putting it into the variable $classes. Thus, the foreach loop will be repeated three times. Each time, $ref will be first set to the value of $pVectors, $nVectors, and finally $uVectors.
By the way, these aren't really scalar values. In Perl, you can have what is called a reference. This is the memory location of the data structure you're referencing. For example, I have five students, and each student has a series of tests they've taken. I want to store all the values of each test in a hash keyed by the student's ID.
Normally, each entry in the hash can only contain a single item. However, what if this item refers to a list that contains the student's grades?
Here's the list of student #100's grade:
#grades = (100, 93, 89, 95, 74);
And here's how I set Student 100's entry in my hash:
$student{100} = \#grades;
Now, I can talk about the first grade of the year for Student #100 as $student{100}[0]. See the Perldoc's Mark's very short tutorial about references.
In my #items = sort { $a <=> $b } keys %items; the "items" on the left side should be different from the "items" on the right side? Why do they use the same name?
In Perl, you have three major types of variables: Lists (what some people call Arrays), Hashes (what some people call Keyed Arrays), and Scalars. In Perl, it is perfectly legal to have different variable types have the same name. Thus, you can have $var, %var, and #var in your program, and they'll be treated as completely separate variables1.
This is usually a bad thing to do and is highly discouraged. It gets worse when you think of the individual values: $var refers to the scalar while $var[3] refers to the list, and $var{3} refers to the hash. Yes, it can be very, very confusing.
In this particular case, he has a hash (a keyed array) called %item, and he's converting the keys in this hash into a list sorted by the keys. This syntax could be simplified from:
my #items = sort { $a <=> $b } keys %items;
to just:
my #items = sort keys %items;
See the Perldocs on the sort function and the keys function.
What does $items{$items[$i]} = $i + 1; aim to do? Looks like it just sets up the value for the hash $items sequentially.
Let's look at the entire loop:
foreach my $i (0 .. $#items)
{
print VAL "$items[$i]\n";
$items{$items[$i]} = $i + 1;
}
The subroutine is going to loop through this loop once for each item in the #items list. This is the sorted list of keys to the old %items hash. The $#items means the largest index in the item list. For example, if #items = ("foo", "bar", and "foobar"), then $#item would be 2 because the last item in this list is $item[2] which equals foobar.
This way, he's hitting the index of each entry in #items. (REMEMBER: This is different from %item!).
The next line is a bit tricky:
$items{$items[$i]} = $i + 1;
Remember that $item{} refers to the old %items hash! He's creating a new %items hash. This is being keyed by each item in the #items list. And, the value is being set to the index of that item plus 1. Let's assume that:
#items = ("foo", "bar", "foobar")
In the end, he's doing this:
$item{foo} = 1;
$item{bar} = 2;
$item{foobar} = 3;
1 Well, this isn't 100% true. Perl stores each variable in a kind of hash structure. In memory, $var, #var, and %var will be stored in the same hash entry in memory, but in positions related to each variable type. 99.9999% of the time, this matters not one bit. As far as you are concerned, these are three completely different variables.
However, there are a few rare occasions where a programmer will take advantage of this when they futz directly with memory in Perl.
I want to show you how I would write that subroutine.
Bur first, I want to show you some of the steps of how, and why, I changed the code.
Reduce the number of for loops:
First off this loop doesn't need to set the value of $items{$cui} to anything in particular. It also doesn't have to be a loop at all.
foreach my $cui (keys %{$ref->{$id}}) { $items{$cui} = 1 }
This does practically the same thing. The only real difference is it sets them all to undef instead.
#items{ keys %{$ref->{$id}} } = ();
If you really needed to set the values to 1. Note that (1)x#keys returns a list of 1's with the same number of elements in #keys.
my #keys = keys %{$ref->{$id}};
#items{ #keys } = (1) x #keys;
If you are going to have to loop over a very large number of elements then a for loop may be a good idea, but only if you have to set the value to something other than undef. Since we are only using the loop variable once, to do something simple; I would use this code:
$items{$_} = 1 for keys %{$ref->{$id}};
Swap keys with values:
On the line before that we see:
foreach my $id (keys %$ref){
In case you didn't notice $id was used only once, and that was for getting the associated value.
That means we can use values and get rid of the %{$ref->{$id}} syntax.
for my $hash (values %$ref){
#items{ keys %$hash } = ();
}
( $hash isn't a good name, but I don't know what it represents. )
3 arg open:
It isn't recommended to use the two argument form of open, or to blindly use the bareword style of filehandles.
open(VAL, "> $classes.items");
As an aside, did you know there is also a one argument form of open. I don't really recommend it though, it's mostly there for backward compatibility.
our $VAL = "> $classes.items";
open(VAL);
The recommend way to do it, is with 3 arguments.
open my $val, '>', "$classes.items";
There may be some rare edge cases where you need/want to use the two argument version though.
Put it all together:
sub GetItems {
# this will cause open and close to die on error (in this subroutine only)
use autodie;
my $classes = shift;
my %items;
for my $vector_hash (#_){
# use values so that we don't have to use $ref->{$id}
for my $hash (values %$ref){
# create the keys in %items
#items{keys %$hash} = ();
}
}
# This assumes that the keys of %items are numbers
my #items = sort { $a <=> $b } keys %items;
# using 3 arg open
open my $output, '>', "$classes.items";
my $index; # = 0;
for $item (#items){
print {$output} $item, "\n";
$items{$item} = ++$index; # 1...
}
close $output;
return \%items;
}
Another option for that last for loop.
for my $index ( 1..#items ){
my $item = $items[$index-1];
print {$output} $item, "\n";
$items{$item} = $index;
}
If your version of Perl is 5.12 or newer, you could write that last for loop like this:
while( my($index,$item) = each #items ){
print {$output} $item, "\n";
$items{$item} = $index + 1;
}

How can I convert the stringified version of array reference to actual array reference in Perl?

Is there any way to get Perl to convert the stringified version e.g (ARRAY(0x8152c28)) of an array reference to the actual array reference?
For example
perl -e 'use Data::Dumper; $a = [1,2,3];$b = $a; $a = $a.""; warn Dumper (Then some magic happens);'
would yield
$VAR1 = [
1,
2,
3
];
Yes, you can do this (even without Inline C). An example:
use strict;
use warnings;
# make a stringified reference
my $array_ref = [ qw/foo bar baz/ ];
my $stringified_ref = "$array_ref";
use B; # core module providing introspection facilities
# extract the hex address
my ($addr) = $stringified_ref =~ /.*(0x\w+)/;
# fake up a B object of the correct class for this type of reference
# and convert it back to a real reference
my $real_ref = bless(\(0+hex $addr), "B::AV")->object_2svref;
print join(",", #$real_ref), "\n";
but don't do that. If your actual object is freed or reused, you may very well
end up getting segfaults.
Whatever you are actually trying to achieve, there is certainly a better way.
A comment to another answer reveals that the stringification is due to using a reference as a hash key. As responded to there, the better way to do that is the well-battle-tested
Tie::RefHash.
The first question is: do you really want to do this?
Where is that string coming from?
If it's coming from outside your Perl program, the pointer value (the hex digits) are going to be meaningless, and there's no way to do it.
If it's coming from inside your program, then there's no need to stringify it in the first place.
Yes, it's possible: use Devel::FindRef.
use strict;
use warnings;
use Data::Dumper;
use Devel::FindRef;
sub ref_again {
my $str = #_ ? shift : $_;
my ($addr) = map hex, ($str =~ /\((.+?)\)/);
Devel::FindRef::ptr2ref $addr;
}
my $ref = [1, 2, 3];
my $str = "$ref";
my $ref_again = ref_again($str);
print Dumper($ref_again);
The stringified version contains the memory address of the array object, so yes, you can recover it. This code works for me, anyway (Cygwin, perl 5.8):
use Inline C;
#a = (1,2,3,8,12,17);
$a = \#a . "";
print "Stringified array ref is $a\n";
($addr) = $a =~ /0x(\w+)/;
$addr = hex($addr);
$c = recover_arrayref($addr);
#c = #$c;
print join ":", #c;
__END__
__C__
AV* recover_arrayref(int av_address) { return (AV*) av_address; }
.
$ perl ref-to-av.pl
Stringified array ref is ARRAY(0x67ead8)
1:2:3:8:12:17
I'm not sure why you want to do this, but if you really need it, ignore the answers that use the tricks to look into memory. They'll only cause you problems.
Why do you want to do this? There's probably a better design. Where are you getting that stringified reference from.
Let's say you need to do it for whatever reason. First, create a registry of objects where the hash key is the stringified form, and the value is a weakened reference:
use Scalar::Util qw(weaken);
my $array = [ ... ];
$registry{ $array } = $array;
weaken( $registry{ $array } ); # doesn't count toward ref count
Now, when you have the stringified form, you just look it up in the hash, checking to see that it's still a reference:
if( ref $registry{$string} ) { ... }
You could also try Tie::RefHash and let it handle all of the details of this.
There is a longer example of this in Intermediate Perl.
In case someone finds this useful, I'm extending tobyink's answer by adding support for detecting segmentation faults. There are two approaches I discovered. The first way locally replaces $SIG{SEGV} and $SIG{BUS} before dereferencing. The second way masks the child signal and checks if a forked child can dereference successfully. The first way is significantly faster than the second.
Anyone is welcome to improve this answer.
First Approach
sub unstringify_ref($) {
use bigint qw(hex);
use Devel::FindRef;
my $str = #_ ? shift : $_;
if (defined $str and $str =~ /\((0x[a-fA-F0-9]+)\)$/) {
my $addr = (hex $1)->bstr;
local $#;
return eval {
local $SIG{SEGV} = sub { die };
local $SIG{BUS} = sub { die };
return Devel::FindRef::ptr2ref $addr;
};
}
return undef;
}
I'm not sure if any other signals can occur in an attempt to access illegal memory.
Second Approach
sub unstringify_ref($) {
use bigint qw(hex);
use Devel::FindRef;
use Signal::Mask;
my $str = #_ ? shift : $_;
if (defined $str and $str =~ /\((0x[a-fA-F0-9]+)\)$/) {
my $addr = (hex $1)->bstr;
local $!;
local $?;
local $Signal::Mask{CHLD} = 1;
if (defined(my $kid = fork)) {
# Child -- This might seg fault on invalid address.
exit(not Devel::FindRef::ptr2ref $addr) unless $kid;
# Parent
waitpid $kid, 0;
return Devel::FindRef::ptr2ref $addr if $? == 0;
} else {
warn 'Unable to fork: $!';
}
}
return undef;
}
I'm not sure if the return value of waitpid needs to be checked.