Perl hash metainformation - perl

Is it possible to store information about a hash, in it?
And by that I mean, without adding the information to the hash in the ordinary way, which would affect keys, values etc.
Thing is I am reading a twod_array into a hash but would like to store the order within the original array without affecting how one traverses through the hash etc.
so for instance:
my #the_keys=keys %the_hash;
should not return the information about the order of the hash.
Is there a way to store meta data within a hash?

You can store arbitrary metadata with the tie mechanism. Minimal example with a package storage that does not affect the standard hash interface:
package MetadataHash;
use Tie::Hash;
use base 'Tie::StdHash';
use Scalar::Util qw(refaddr);
our %INSERT_ORDER;
sub STORE {
my ($h, $k, $v) = #_;
$h->{$k} = $v;
push #{ $INSERT_ORDER{refaddr $h} }, $k;
}
1;
package main;
tie my %h, 'MetadataHash';
%h = ( I => 1, n => 2, d => 3, e => 4 );
$h{x} = 5;
# %MetadataHash::INSERT_ORDER is (9042936 => ['I', 'n', 'd', 'e', 'x'])
print keys %h;
# 'enIxd'

Well, one can always use Tie::Hash::Indexed, I suppose:
use Tie::Hash::Indexed;
tie my %hash, 'Tie::Hash::Indexed';
%hash = ( I => 1, n => 2, d => 3, e => 4 );
$hash{x} = 5;
print keys %hash, "\n"; # prints 'Index'
print values %hash, "\n"; # prints '12345'

Related

what is the difference between taking reference to a variable using \ and {},[] in perl?

below code works fine but if I replace push #array,{%hash} with push #array,\%hash then it doesn't. Can someone please help me understand the difference. I believe {%hash} refers to an anonymous hash. Does it mean a anonymous hash lives longer than a reference to a named hash ( \%hash ).
use strict;
use warnings;
use Data::Dumper;
my #array;
my %hash;
%hash = ('a' => 1,
'b' => 2,
'c' => 3,);
push #array,{%hash};
%hash = ('e' => 1,
'f' => 2,
'd' => 3,);
push #array,{%hash};
print Dumper \#array;
output
$VAR1 = [
{
'c' => 3,
'a' => 1,
'b' => 2
},
{
'e' => 1,
'd' => 3,
'f' => 2
}
];
UPDATE
Below is the actual code I am working on. I think in this case taking copy of the reference is the only possible solution I believe. Please correct me if I am wrong.
use Data::Dumper;
use strict;
use warnings;
my %csv_data;
my %temp_hash;
my #cols_of_interest = qw(dev_file test_file diff_file status);
<DATA>; #Skipping the header
while (my $row = <DATA>) {
chomp $row;
my #array = split /,/,$row;
#temp_hash{#cols_of_interest} = #array[3..$#array];
push #{$csv_data{$array[0]}{$array[1] . ':' . $array[2]}},{%temp_hash};
}
print Dumper \%csv_data;
__DATA__
dom,type,id,dev_file,test_file,diff_file,status
A,alpha,1234,dev_file_1234_1.txt,test_file_1234_1.txt,diff_file_1234_1.txt,pass
A,alpha,1234,dev_file_1234_2.txt,test_file_1234_2.txt,diff_file_1234_2.txt,fail
A,alpha,1234,dev_file_1234_3.txt,test_file_1234_3.txt,diff_file_1234_3.txt,pass
B,beta,4567,dev_file_4567_1.txt,test_file_4567_1.txt,diff_file_4567_1.txt,pass
B,beta,4567,dev_file_4567_2.txt,test_file_4567_2.txt,diff_file_4567_2.txt,fail
C,gamma,3435,dev_file_3435_1.txt,test_file_3435_1.txt,diff_file_3435_1.txt,pass
D,hexa,6768,dev_file_6768_1.txt,test_file_6768_1.txt,diff_file_6768_1.txt,fail
Both \%hash and {%hash} create references, but they reference two different things.
\%hash is a ref to %hash. If dereferenced, its values will change with the values in %hash.
{%hash} creates a new anonymous hash reference from the values in %hash. It creates a copy. It's the simplest way of creating a shallow copy of a data structure in Perl. If you alter %hash, this copy is not affected.
How long a variable lives has nothing to do with what kind the variable is, or how it was created. Only the scope is relevant for that. References in Perl are a special case here, because there is an internal ref counter that keeps track of references to a value, so that it is kept alive if there are still references around somewhere even if it goes out of scope. That's why this works:
sub frobnicate {
my %hash = ( foo => 'bar' );
return \%hash;
}
If you want to disassociate the reference from the initial value, you need to turn it into a weak reference via weaken from Scalar::Util. That way, the ref count will not be influenced by it, but it will still be related to the value, while a copy would not be.
See perlref and perlreftut for more information on references. This question deals with how to see the ref count. A description for that is also available in the chapter Reference Counts and Mortality in perlguts.
You can't really compare \ to {} and [] since they don't do the same thing at all.
{ LIST } is short for my %anon = LIST; \%anon
[ LIST ] is short for my #anon = LIST; \#anon
Maybe you meant to compare
 
my %hash = ...;
push #a, \%hash;
 
push #a, { ... };
 
my %hash = ...;
push #a, { %hash };
The first snippet places a reference to %hash in #a. This is presumably found in a loop. As long as my %hash is found in the loop, a reference to a new hash will be placed in #a each time.
The second snippet does the same, just using an anonymous hash.
The third snippet makes a copy of %hash, and places a reference to that copy in #a. It gives the impression of wastefulness, so it's discouraged. (It's not actually not that wasteful because it allows %hash to be reused.)
You could also write your code
# In reality, the two blocks below are probably the body of one sub or one loop.
{
my %hash = (
a => 1,
b => 2,
c => 3,
);
push #a, \%hash;
}
{
my %hash = (
d => 3,
e => 1,
f => 2,
);
push #a, \%hash;
}
or
push #a, {
a => 1,
b => 2,
c => 3,
};
push #a, {
d => 3,
e => 1,
f => 2,
};
my #cols_of_interest = qw( dev_file test_file diff_file status );
my %csv_data;
if (defined( my $row = <DATA> )) {
chomp $row;
my #cols = split(/,/, $row);
my %cols_of_interest = map { $_ => 1 } #cols_of_interest;
my #cols_to_delete = grep { !$cols_of_interest{$_} } #cols;
while ( my $row = <DATA> ) {
chomp $row;
my %row; #row{#cols} = split(/,/, $row);
delete #row{#cols_to_delete};
push #{ $csv_data{ $row{dev_file} }{ "$row{test_file}:$row{diff_file}" } }, \%row;
}
}
Better yet, let's use a proper CSV parser.
use Text::CSV_XS qw( );
my #cols_of_interest = qw( dev_file test_file diff_file status );
my $csv = Text::CSV_XS->new({
auto_diag => 2,
binary => 1,
});
my #cols = $csv->header(\*DATA);
my %cols_of_interest = map { $_ => 1 } #cols_of_interest;
my #cols_to_delete = grep { !$cols_of_interest{$_} } #cols;
my %csv_data;
while ( my $row = $csv->getline_hr(\*DATA) ) {
delete #$row{#cols_to_delete};
push #{ $csv_data{ $row->{dev_file} }{ "$row->{test_file}:$row->{diff_file}" } }, $row;
}

Adding hash keys

I am adding data to a hash using an incrementing numeric key starting at 0. The key/value is fine. When I add the second one, the first key/value pair points back to the second. Each addition after that replaces the value of the second key and then points back to it. The Dumper output would be something like this.
$VAR1 = { '0' => { ... } };
After the first key/value is added. After the second one is added I get
$VAR1= { '1' => { ... }, '0' => $VAR1->{'1} };
After the third key/value is added, it looks like this.
$VAR1 = { '1' => { ... }, '0' => $VAR1->{'1'}, '2' => $VAR1->{'1'} };
My question is why is it doing this? I want each key/value to show up in the hash. When I iterate through the hash I get the same data for every key/value. How do I get rid of the reference pointers to the second added key?
You are setting the value of every element to a reference to the same hash. Data::Dumper is merely reflecting that.
If you're using Data::Dumper as a serializing tool (yuck!), then you should set $Data::Dumper::Purity to 1 to get something eval can process.
use Data::Dumper qw( Dumper );
my %h2 = (a=>5,b=>6,c=>7);
my %h;
$h{0} = \%h2;
$h{1} = \%h2;
$h{2} = \%h2;
print("$h{0}{c} $h{2}{c}\n");
$h{0}{c} = 9;
print("$h{0}{c} $h{2}{c}\n");
{
local $Data::Dumper::Purity = 1;
print(Dumper(\%h));
}
Output:
7 7
9 9
$VAR1 = {
'0' => {
'c' => 9,
'a' => 5,
'b' => 6
},
'1' => {},
'2' => {}
};
$VAR1->{'0'} = $VAR1->{'1'};
$VAR1->{'2'} = $VAR1->{'1'};
If, on the other hand, you didn't mean to use store references to different hashes, you could use
# Shallow copies
$h{0} = { %h2 }; # { ... } means do { my %anon = ( ... ); \%anon }
$h{1} = { %h2 };
$h{2} = { %h2 };
or
# Deep copies
use Storable qw( dclone );
$h{0} = dclone(\%h2);
$h{1} = dclone(\%h2);
$h{2} = dclone(\%h2);
Output:
7 7
9 7
$VAR1 = {
'0' => {
'a' => 5,
'b' => 6,
'c' => 9
},
'1' => {
'a' => 5,
'b' => 6,
'c' => 7
},
'2' => {
'a' => 5,
'b' => 6,
'c' => 7
}
};
You haven't posted the actual code you're using to build the hash, but I assume it looks something like this:
foreach my $i (1 .. 3) {
%hash2 = (number => $i, foo => "bar", baz => "whatever");
$hash1{$i} = \%hash2;
}
(Actually, I'll guess that, in your actual code, you're probably reading data from a file in a while (<>) loop and assigning values to %hash2 based on it, but the foreach loop will do for demonstration purposes.)
If you run the code above and dump the resulting %hash1 using Data::Dumper, you'll get the output:
$VAR1 = {
'1' => {
'baz' => 'whatever',
'number' => 3,
'foo' => 'bar'
},
'3' => $VAR1->{'1'},
'2' => $VAR1->{'1'}
};
Why does it look like that? Well, it's because the values in %hash1 are all references pointing to the same hash, namely %hash2. When you assign new values to %hash2 in your loop, those values will overwrite the old values in %hash2, but it will still be the same hash. Data::Dumper is just highlighting that fact.
So, how can you fix it? Well, there are (at least) two ways. One way is to replace \%hash2, which gives a reference to %hash2, with { %hash2 }, which copies the contents of %hash2 into a new anonymous hash and returns a reference to that:
foreach my $i (1 .. 3) {
%hash2 = (number => $i, foo => "bar", baz => "whatever");
$hash1{$i} = { %hash2 };
}
The other (IMO preferable) way is to declare %hash2 as a (lexically scoped) local variable within the loop using my:
foreach my $i (1 .. 3) {
my %hash2 = (number => $i, foo => "bar", baz => "whatever");
$hash1{$i} = \%hash2;
}
This way, each iteration of the loop will create a new, different hash named %hash2, while the hashes created on previous iterations will continue to exist (since they're referenced from %hash1) independently.
By the way, you wouldn't have had this problem in the first place if you'd followed standard Perl best practices, specifically:
Always use strict; (and use warnings;). This would've forced you to declare %hash2 with my (although it wouldn't have forced you to do so inside the loop).
Always declare local variables in the smallest possible scope. In this case, since %hash2 is only used within the loop, you should've declared it inside the loop, like above.
Following these best practices, the example code above would look like this:
use strict;
use warnings;
use Data::Dumper qw(Dumper);
my %hash1;
foreach my $i (1 .. 3) {
my %hash2 = (number => $i, foo => "bar", baz => "whatever");
$hash1{$i} = \%hash2;
}
print Dumper(\%hash1);
which, as expected, will print:
$VAR1 = {
'1' => {
'baz' => 'whatever',
'number' => 1,
'foo' => 'bar'
},
'3' => {
'baz' => 'whatever',
'number' => 3,
'foo' => 'bar'
},
'2' => {
'baz' => 'whatever',
'number' => 2,
'foo' => 'bar'
}
};
It's hard to see what the problem is when you don't post the code or the actual results of Data::Dumper.
There is one thing you should know about Data::Dumper: When you dump an array or (especially) a hash, you should dump a reference to it. Otherwise, Data::Dumper will treat it like a series of variables. Also notice that hashes do not remain in the order you create them. I've enclosed an example below. Make sure that your issue isn't related to a confusing Data::Dumper output.
Another question: If you're keying your hash by sequential keys, would you be better off with an array?
If you can, please edit your question to post your code and the ACTUAL results.
use strict;
use warnings;
use autodie;
use feature qw(say);
use Data::Dumper;
my #array = qw(one two three four five);
my %hash = (one => 1, two => 2, three => 3, four => 4);
say "Dumped Array: " . Dumper #array;
say "Dumped Hash: " . Dumper %hash;
say "Dumped Array Reference: " . Dumper \#array;
say "Dumped Hash Reference: " . Dumper \%hash;
The output:
Dumped Array: $VAR1 = 'one';
$VAR2 = 'two';
$VAR3 = 'three';
$VAR4 = 'four';
$VAR5 = 'five';
Dumped Hash: $VAR1 = 'three';
$VAR2 = 3;
$VAR3 = 'one';
$VAR4 = 1;
$VAR5 = 'two';
$VAR6 = 2;
$VAR7 = 'four';
$VAR8 = 4;
Dumped Array Reference: $VAR1 = [
'one',
'two',
'three',
'four',
'five'
];
Dumped Hash Reference: $VAR1 = {
'three' => 3,
'one' => 1,
'two' => 2,
'four' => 4
};
The reason it is doing this is you are giving it the same reference to the same hash.
Presumably in a loop construct.
Here is a simple program which has this behaviour.
use strict;
use warnings;
# always use the above two lines until you
# understand completely why they are recommended
use Data::Printer;
my %hash;
my %inner; # <-- wrong place to put it
for my $index (0..5){
$inner{int rand} = $index; # <- doesn't matter
$hash{$index} = \%inner;
}
p %hash;
To fix it just make sure that you are creating a fresh hash reference every time through the loop.
use strict;
use warnings;
use Data::Printer;
my %hash;
for my $index (0..5){
my %inner; # <-- place the declaration here instead
$inner{int rand} = $index; # <- doesn't matter
$hash{$index} = \%inner;
}
p %hash;
If you are only going to use numbers for your indexes, and they are monotonically increasing starting from 0, then I would recommend using an array.
An array would be faster and more memory efficient.
use strict;
use warnings;
use Data::Printer;
my #array; # <--
for my $index (0..5){
my %inner;
$inner{int rand} = $index;
$array[$index] = \%inner; # <--
}
p #array;

Sorting a hash from the first key to the last (Perl)

I have the following hash, and I wish to keep it in the order I've set it in; is this even possible? If not, do any alternatives exist?
my %hash = ('Key1' => 'Value1', 'Key2' => 'Value2', 'Key3' => 'Value3');
Do I need to write a custom sorting subroutine? What are my options?
Thank you!
http://metacpan.org/pod/Tie::IxHash
use Tie::IxHash;
my %hash;
tie %hash,'Tie::IxHash';
This hash will maintain its order.
See Tie::Hash::Indexed. Quoting its Synopsis:
use Tie::Hash::Indexed;
tie my %hash, 'Tie::Hash::Indexed';
%hash = ( I => 1, n => 2, d => 3, e => 4 );
$hash{x} = 5;
print keys %hash, "\n"; # prints 'Index'
print values %hash, "\n"; # prints '12345'
Try doing this :
print "$_=$hash{$_}\n" for sort keys %hash;
if you want it sorted in alphabetic order.
If you need to retain original order, see other posts.
See http://perldoc.perl.org/functions/sort.html
One possibility is to do the same as you sometimes do with arrays: specify the keys.
for (0..$#a) { # Sorted array keys
say $a[$_];
}
for (sort keys %h) { # Sorted hash keys
say $h{$_};
}
for (0, 1, 3) { # Sorted array keys
say $h{$_};
}
for (qw( Key1 Key2 Key3 )) { # Sorted hash keys
say $h{$_};
}
You can also fetch the ordered values as follows:
my #values = #h{qw( Key1 Key2 Key3 )};
This depends on how you're going to access the data. If you just want to store them and access the last/first values, you can always put hashes in an array and use push() and pop().
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
use Data::Dumper;
my #hashes;
foreach( 1..5 ){
push #hashes, { "key $_" => "foo" };
}
say Dumper(\#hashes);

sum hash of hash values using perl

I have a Perl script that parses an Excel file and does the following : It counts for each value in column A, the number of elements it has in column B, the script looks like this :
use strict;
use warnings;
use Spreadsheet::XLSX;
use Data::Dumper;
use List::Util qw( sum );
my $col1 = 0;
my %hash;
my $excel = Spreadsheet::XLSX->new('inout_chartdata_ronald.xlsx');
my $sheet = ${ $excel->{Worksheet} }[0];
$sheet->{MaxRow} ||= $sheet->{MinRow};
my $count = 0;
# Iterate through each row
foreach my $row ( $sheet->{MinRow}+1 .. $sheet->{MaxRow} ) {
# The cell in column 1
my $cell = $sheet->{Cells}[$row][$col1];
if ($cell) {
# The adjacent cell in column 2
my $adjacentCell = $sheet->{Cells}[$row][ $col1 + 1 ];
# Use a hash of hashes
$hash{ $cell->{Val} }{ $adjacentCell->{Val} }++;
}
}
print "\n", Dumper \%hash;
The output looks like this :
$VAR1 = {
'13' => {
'klm' => 1,
'hij' => 2,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
};
This works great, my question is : How can I access the elements of this output $VAR1 in order to do : for value 13, klm + hij = 3 and get a final output like this :
$VAR1 = {
'13' => {
'somename' => 3,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
};
So basically what I want to do is loop through my final hash of hashes and access its specific elements based on a unique key and finally do their sum.
Any help would be appreciated.
Thanks
I used #do_sum to indicate what changes you want to make. The new key is hardcoded in the script. Note that the new key is not created if no key exists in the subhash (the $found flag).
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my %hash = (
'13' => {
'klm' => 1,
'hij' => 2,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
);
my #do_sum = qw(klm hij);
for my $num (keys %hash) {
my $found;
my $sum = 0;
for my $key (#do_sum) {
next unless exists $hash{$num}{$key};
$sum += $hash{$num}{$key};
delete $hash{$num}{$key};
$found = 1;
}
$hash{$num}{somename} = $sum if $found;
}
print Dumper \%hash;
It sounds like you need to learn about Perl References, and maybe Perl Objects which are just a nice way to deal with references.
As you know, Perl has three basic data-structures:
Scalars ($foo)
Arrays (#foo)
Hashes (%foo)
The problem is that these data structures can only contain scalar data. That is, each element in an array can hold a single value or each key in a hash can hold a single value.
In your case %hash is a Hash where each entry in the hash references another hash. For example:
Your %hash has an entry in it with a key of 13. This doesn't contain a scalar value, but a references to another hash with three keys in it: klm, hij, and lkm. YOu can reference this via this syntax:
${ hash{13} }{klm} = 1
${ hash{13} }{hij} = 2
${ hash{13} }{lkm} = 4
The curly braces may or may not be necessary. However, %{ hash{13} } references that hash contained in $hash{13}, so I can now reference the keys of that hash. You can imagine this getting more complex as you talk about hashes of hashes of arrays of hashes of arrays. Fortunately, Perl includes an easier syntax:
$hash{13}->{klm} = 1
%hash{13}->{hij} = 2
%hash{13}->{lkm} = 4
Read up about hashes and how to manipulate them. After you get comfortable with this, you can start working on learning about Object Oriented Perl which handles references in a safer manner.

How can I get a only part of a hash in Perl?

Is there a way to get a sub-hash? Do I need to use a hash slice?
For example:
%hash = ( a => 1, b => 2, c => 3 );
I want only
%hash = ( a => 1, b => 2 );
Hash slices return the values associated with a list of keys. To get a hash slice you change the sigil to # and provide a list of keys (in this case "a" and "b"):
my #items = #hash{"a", "b"};
Often you can use a quote word operator to produce the list:
my #items = #hash{qw/a b/};
You can also assign to a hash slice, so if you want a new hash that contains a subset of another hash you can say
my %new_hash;
#new_hash{qw/a b/} = #hash{qw/a b/};
Many people will use a map instead of hash slices:
my %new_hash = map { $_ => $hash{$_} } qw/a b/;
Starting with Perl 5.20.0, you can get the keys and the values in one step if you use the % sigil instead of the # sigil:
my %new_hash = %hash{qw/a b/};
You'd probably want to assemble a list of keys you want:
my #keys = qw(a b);
And then use a loop to make the hash:
my %hash_slice;
for(#keys) {
$hash_slice{$_} = %hash{$_};
}
Or:
my %hash_slice = map { $_ => $hash{$_} } #keys;
(My preference is the second one, but whichever one you like is best.)
Yet another way:
my #keys = qw(a b);
my %hash = (a => 1, b => 2, c => 3);
my %hash_copy;
#hash_copy{#keys} = #hash{#keys};
Too much functional programming leads me to think of zip first.
With List::MoreUtils installed,
use List::MoreUtils qw(zip);
%hash = qw(a 1 b 2 c 3);
#keys = qw(a b);
#values = #hash{#keys};
%hash = zip #keys, #values;
Unfortunately, the prototype of List::MoreUtils's zip inhibits
zip #keys, #hash{#keys};
If you really want to avoid the intermediate variable, you could
zip #keys, #{[#hash{#keys}]};
Or just write your own zip without the problematic prototype. (This doesn't need List::MoreUtils at all.)
sub zip {
my $max = -1;
$max < $#$_and $max = $#$_ for #_;
map { my $ix = $_; map $_->[$ix], #_; } 0..$max;
}
%hash = zip \#keys, [#hash{#keys}];
If you're going to be mutating in-place,
%hash = qw(a 1 b 2 c 3);
%keep = map +($_ => 1), qw(a b);
$keep{$a} or delete $hash{$a} while ($a, $b) = each %hash;
avoids the extra copying that the map and zip solutions incur. (Yes, mutating the hash while you're iterating over it is safe... as long as the mutation is only deleting the most recently iterated pair.)
FWIW, I use Moose::Autobox here:
my $hash = { a => 1, b => 2, c => 3, d => 4 };
$hash->hslice([qw/a b/]) # { a => 1, b => 2 };
In real life, I use this to extract "username" and "password" from a form submission, and pass that to Catalyst's $c->authenticate (which expects, in my case, a hashref containing the username and password, but nothing else).
New in perl 5.20 is hash slices returning keys as well as values by using % like on the last line here:
my %population = ('Norway',5000000,'Sweden',9600000,'Denmark',5500000);
my #slice_values = #population{'Norway','Sweden'}; # all perls can do this
my %slice_hash = %population{'Norway','Sweden'}; # perl >= 5.20 can do this!
A hash is an unordered container, but the term slice only really makes sense in terms of an ordered container. Maybe look into using an array. Otherwise, you may just have to remove all of the elements that you don't want to produce your 'sub-hash'.