different hash construction processes in Perl

different hash construction processes in Perl - perl

I noticed a hash object was once defined as in the following:
my %data = ();
$data{file} = $file;
$data{concept} = $#row;
$data{line1} {$cell[0]} = $cell[1];
What does this hash construction process try to achieve? Or what is the difference between
$data{concept} = $#row;
and
$data{line1} {$cell[0]} = $cell[1];
?

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $file = "contents of my file";
my #row = qw(some random data);
my #cell = qw(key value);
my %data = ();
$data{file} = $file;
$data{concept} = $#row;
$data{line1} {$cell[0]} = $cell[1];
print Dumper \%data;
Output:
$VAR1 = {
'file' => 'contents of my file',
'line1' => {
'key' => 'value'
},
'concept' => '2'
};
I think $data{line1} {$cell[0]} is better written as $data{line1}{$cell[0]} or (my preference) $data{line1}->{$cell[0]}.
I included the scalar $file and the arrrays #row and #cell to demonstrate what your code means.
$data{file} = $file;
adds the contents of $file to your hash with the key file.
$data{concept} = $#row;
adds the last index of #row to your hash with the key concept. In my example the last index is 2, since the indexes in #row are 0, 1 and 2.
$data{line1} {$cell[0]} = $cell[1];
adds a hash ref to your hash with the key line1 (through autovivification) and adds the element $cell[1] to this hash ref with the key $cell[0]. Autovivification in this case means that Perl associates line1 with a hash ref and creates it, because you're accessing it with {$cell[0]}. That saves you the trouble of having to write:
$data{line} = {};
$data{line}{$cell[0]} = $cell[1];

Related

Perl: undefined value as a HASH reference [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 months ago.
Improve this question
I have inherited an older script that uses hash references that I don't understand. It results in:
Can't use an undefined value as a HASH reference at
./make_quar_dbfile.pl line 65.
63 my $bucket = sprintf('%02x', $i);
64 my $file = sprintf('%s/%02x.db', $qdir, $i);
65 %{$hashes{$bucket}} ? 1 : next;
66 tie (my %hash, 'DB_File', $file, O_RDWR, 0600) || die "Can't open db file: $! \n ";
67 %hash = %{$hashes{$bucket}};
68 untie %hash;
The script reads through a number of gzipd emails to identify the sender/recip/subject/date etc., then writes that info to a DB_File hash.
This script used to work with older versions of Perl, but looks like it now is no longer compliant.
I'd really like to understand how this works, but I don't fully understand reference/dereference, why it's even necessary here, and the %{$var} notation. All of the references I've studied talk about hash references in terms of $hash_ref = \%author; not %hash_ref = %{$author}, for example.
Ideas on how to get this to work with hash references would be greatly appreciated.
#!/usr/bin/perl -w
use DB_File;
use File::Basename qw(basename);
use vars qw($verbose);
use strict;
use warnings;
sub DBG($);
$verbose = shift || 1;
my $qdir = '/var/spool/amavisd/qdb';
my $source_dir = '/var/spool/amavisd/quarantine';
my $uid = getpwnam('amavis');
my $gid = getgrnam('amavis');
my %hashes = ( );
my $me = basename($0);
my $version = '1.9';
my $steps = 100;
my $cnt = 0;
DBG("- Creating initial database files...");
for (my $i = 0; $i < 256; $i++) {
my $file = sprintf('%s/%02x.db', $qdir, $i);
unlink $file || DBG("Could not unlink $file to empty db: $! \n");
tie (my %hash, "DB_File", $file, O_CREAT, 0600) || die "Can't open db file: $! \n";
untie %hash;
chown($uid, $gid, $file) || die "Unable to set attributes on file: $! \n";
}
DBG("done\n");
opendir SOURCEDIR, $source_dir || die "Cannot open $source_dir: $! \n";
DBG("- Building hashes... ");
foreach my $f (sort readdir SOURCEDIR) {
next if ($f eq "." || $f eq "..");
if ($f =~ m/^(spam|virus)\-([^\-]+)\-([^\-]+)(\.gz)?/) {
my $type = $1;
my $key = $3;
my $bucket = substr($key, 0, 2);
my $d = $2;
my $subj = '';
my $to = '';
my $from = '';
my $size = '';
my $score = '0.0';
if (($cnt % $steps) == 0) { DBG(sprintf("\e[8D%-8d", $cnt)); } $cnt++;
if ($f =~ /\.gz$/ && open IN, "zcat $source_dir/$f |") {
while(<IN>) {
last if ($_ eq "\n");
$subj = $1 if (/^Subject:\s*(.*)$/);
$to = $1 if (/^To:\s*(.*)$/);
$from = $1 if (/^From:\s*(.*)$/);
$score = $1 if (/score=(\d{1,3}\.\d)/);
}
close IN;
$to =~ s/^.*\<(.*)\>.*$/$1/;
$from =~ s/^.*\<(.*)\>.*$/$1/;
$size = (stat("$source_dir/$f"))[7];
$hashes{$bucket}->{$f} = "$type\t$d\t$size\t$from\t$to\t$subj\t$score";
}
}
}
closedir SOURCEDIR;
DBG("...done\n\n- Populating database files...");
for (my $i = 0; $i < 256; $i++) {
my $bucket = sprintf('%02x', $i);
my $file = sprintf('%s/%02x.db', $qdir, $i);
%{$hashes{$bucket}} ? 1 : next;
tie (my %hash, 'DB_File', $file, O_RDWR, 0600) || die "Can't open db file: $! \n ";
%hash = %{$hashes{$bucket}};
untie %hash;
}
exit(0);
sub DBG($) { my $msg = shift; print $msg if ($verbose); }

What is $hash{$key}? A value associated with the (value of) $key, which must be a scalar. So we get the $value out of my %hash = ( $key => $value ).
That's a string, or a number. Or a filehandle. Or, an array reference, or a hash reference. (Or an object perhaps, normally a blessed hash reference.) They are all scalars, single-valued things, and as such are a legitimate value in a hash.
The syntax %{ ... } de-references a hash reference† so judged by %{ $hashes{$bucket} } that code expects there to be a hash reference. So the error says that there is actually nothing in %hashes for that value of a would-be key ($bucket), so it cannot "de-reference" it. There is either no key that is the value of $bucket at that point in the loop, or there is such a key but it has never been assigned anything.
So go debug it. Add printing statements through the loops so you can see what values are there and what they are, and which ones aren't even as they are assumed to be. Hard to tell what fails without running that program.
Then, the line %{$hashes{$bucket}} ? 1 : next; is a little silly. The condition of the ternary operator evaluates to a boolean, "true" (not undefined, not 0, not empty string '') or false. So it tests whether $hashes{$bucket} has a hashref with at least some keys, and if it does then it returns 1; so, the for loop continues. Otherwise it skips to the next iteration.
Well, then skip to next if there is not a (non-empty) hashref there:
next if not defined $hashes{$bucket} or not %{ $hashes{$bucket} };
Note how we first test whether there is such a key, and only then attempt to derefence it.
† Whatever expression may be inside the curlies must evaluate to a hash reference. (If it's else, like a number or a string, the code would still exit with an error but with a different one.)
So, in this code, the hash %hashes must have a key that is the value of $bucket at that point, and the value for that key must be a hash reference. Then, the ternary operator tests whether the hash obtained from that hash reference has any keys.

You need to understand references first, this is a kind of how-to :
#!/usr/bin/perl
use strict; use warnings;
use feature qw/say/;
use Data::Dumper;
my $var = {}; # I create a HASH ref explicitly
say "I created a HASH ref explicitly:";
say ref($var);
say "Now, let's add any type of content:";
say "Adding a ARRAY:";
push #{ $var->{arr} }, (0..5);
say Dumper $var;
say "Now, I add a new HASH";
$var->{new_hash} = {
foo => "value",
bar => "other"
};
say Dumper $var;
say 'To access the data in $var without Data::Dumper, we need to dereference what we want to retrieve';
say "to retrieve a HASH ref, we need to dereference with %:";
while (my ($key, $value) = each %{ $var->{new_hash} }) {
say "key=$key value=$value";
}
say "To retrieve the ARRAY ref:";
say join "\n", #{ $var->{arr} };
Output
I created a HASH ref explicitely:
HASH
Now, let's add any type of content:
Adding a ARRAY:
$VAR1 = {
'arr' => [
0,
1,
2,
3,
4,
5
]
};
Now, I add a new HASH
$VAR1 = {
'new_hash' => {
'foo' => 'value',
'bar' => 'other'
},
'arr' => [
0,
1,
2,
3,
4,
5
]
};
To access the data in $var without Data::Dumper, we need to dereference what we want to retrieve
to retrieve a HASH ref, we need to dereference with %:
key=foo value=value
key=bar value=other
To retrieve the ARRAY ref:
0
1
2
3
4
5
Now with your code, instead of
%{$hashes{$bucket}} ? 1 : next;
You should test the HASH ref first, because Perl say it's undefined, let's debug a bit:
use Data::Dumper;
print Dumper $hashes;
print "bucket=$bucket\n";
if (defined $hashes{$bucket}) {
print "Defined array\n";
}
else {
print "NOT defined array\n";
}

How can multiple hash values be retrieved using perl?

How can multiple hash values be retrieved? I tried using
use Hash::MultiValue and get_all(). It throws an error saying "Can't call method "get_all" on an undefined value" . Which is the better option to implement this functionality of multiple values for a particular key ? The value of the key is the file that is being opened.
use warnings;
use List::MoreUtils qw(firstidx);
use Hash::MultiValue;
my $key_in;
…
open ($FR, "<$i") or die "couldn't open list";
while($line=<$FR>){
if($line =~ /search_pattern/){
my $idx = firstidx { $_ eq 'hash_key' } #tags;
my $key= #tags[$idx+1];
$hash{$key}= Hash::MultiValue->new($key=>'$i');
}
close($FR);
for my $key_in ( sort keys %hash ) {
#key_in = $hash->get_all('$key_in');
print "$key_in = $hash{$key_in}\n";
}
my $key_in = <STDIN>;
if (exists($hash{$key_in})){
$hash_value = $hash{$key_in};
}else{
exit;
}

I think you want an array reference for the value. You can then treat that as an array. This is the sort of stuff we show you in Intermediate Perl:
$hash{$key} = [];
push #{ $hash{$key} }, $some_value;
my #values = #{ $hash{$key} };
With Perl v5.24, you can use postfix dereferencing to make it a bit prettier:
use v5.24;
$hash{$key} = [];
push $hash{$key}->#*, 'foo';
push $hash{$key}->#*, 'bar';
my #values = $hash{$key}->#*;
And, since Perl automatically takes an undefined value and turns it into the reference structure you need (auto vivification), you don't need to initialize an undefined value:
use v5.24;
push $hash{$key}->#*, 'foo';
push $hash{$key}->#*, 'bar';
my #values = $hash{$key}->#*;
Get all the keys of a hash:
my #keys = keys %hash;
Get all of the values (in the order of the corresponding keys if you haven't changed the hash since you called keys):
my #values = values %keys;
Get some values with a hash slice:
my #some_values = #hash{#some_keys};
Get some keys and values (key-value slice):
use v5.20;
my %smaller_hash = %hash{#some_keys}

Here is an example of how you can use get_all() from Hash::MultiValue to retrive multiple hash values for a given key:
use strict;
use warnings;
use Data::Dumper qw(Dumper);
use Hash::MultiValue;
my $hash = Hash::MultiValue->new();
$hash->add(tag1 => 'file1');
$hash->add(tag1 => 'file2');
$hash->add(tag2 => 'file3');
my #foo = $hash->get_all('tag1');
print(Dumper(\#foo));
Output:
$VAR1 = [
'file1',
'file2'
];

How to split a string into multiple hash keys in perl

I have a series of strings for example
my #strings;
$strings[1] = 'foo/bar/some/more';
$strings[2] = 'also/some/stuff';
$strings[3] = 'this/can/have/way/too/many/substrings';
What I would like to do is to split these strings and store them in a hash as keys like this
my %hash;
$hash{foo}{bar}{some}{more} = 1;
$hash{also}{some}{stuff} = 1;
$hash{this}{can}{have}{way}{too}{many}{substrings} = 1;
I could go on and list my failed attempts, but I don't think they add to the value to the question, but I will mention one. Lets say I converted 'foo/bar/some/more' to '{foo}{bar}{some}{more}'. Could I somehow store that in a variable and do something like the following?
my $var = '{foo}{bar}{some}{more}';
$hash$var = 1;
NOTE: THIS DOESN'T WORK, but I hope it only doesn't due to a syntax error.
All help appreciated.

Identical logic to Shawn's answer. But I've hidden the clever hash-walking bit in a subroutine. And I've set the final value to 1 rather than an empty hash reference.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my #keys = qw(
foo/bar/some/more
also/some/stuff
this/can/have/way/too/many/substrings
);
my %hash;
for (#keys) {
multilevel(\%hash, $_);
}
say Dumper \%hash;
sub multilevel {
my ($hashref, $string) = #_;
my $curr_ref = $hashref;
my #strings = split m[/], $string;
for (#strings[0 .. $#strings - 1]) {
$curr_ref->{$_} //= {};
$curr_ref = $curr_ref->{$_};
}
$curr_ref->{#strings[-1]} = 1;
}

You have to use hash references to walk down thru the list of keys.
use Data::Dumper;
my %hash = ();
while( my $string = <DATA> ){
chomp $string;
my #keys = split /\//, $string;
my $hash_ref = \%hash;
for my $key ( #keys ){
$hash_ref->{$key} = {};
$hash_ref = $hash_ref->{$key};
}
}
say Dumper \%hash;
__DATA__
foo/bar/some/more
also/some/stuff
this/can/have/way/too/many/substrings

Just use a library.
use Data::Diver qw(DiveVal);
my #strings = (
undef,
'foo/bar/some/more',
'also/some/stuff',
'this/can/have/way/too/many/substrings',
);
my %hash;
for my $index (1..3) {
my $root = {};
DiveVal($root, split '/', $strings[$index]) = 1;
%hash = (%hash, %$root);
}
__END__
(
also => {some => {stuff => 1}},
foo => {bar => {some => {more => 1}}},
this => {can => {have => {way => {too => {many => {substrings => 1}}}}}},
)

I took the easy way out w/'eval':
use Data::Dumper;
%hash = ();
#strings = ( 'this/is/a/path', 'and/another/path', 'and/one/final/path' );
foreach ( #strings ) {
s/\//\}\{/g;
$str = '{' . $_ . '}'; # version 2: remove this line, and then
eval( "\$hash$str = 1;" ); # eval( "\$hash{$_} = 1;" );
}
print Dumper( %hash )."\n";

Build hash of hash in perl

I'm new to using perl and I'm trying to build a hash of a hash from a tsv. My current process is to read in a file and construct a hash and then insert it into another hash.
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
my #data = split "\t", $line;
my $id;
my $iter = each_array(#columns, #data);
while(my($k, $v) = $iter->())
{
$hash{$k} = $v;
if($k eq 'Id')
{
$id = $v;
}
}
$hoh{$id} = %hash;
}
print "dump: ", Dumper(%hoh);
This outputs:
dump
$VAR1 = '1234567890';
$VAR2 = '17/32';
$VAR3 = '1234567891';
$VAR4 = '17/32';
.....
Instead of what I would expect:
dump
{
'1234567890' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567890'
},
'1234567891' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567891'
},
........
};
My limited understanding is that when I do $hoh{$id} = %hash; its inserting in a reference to %hash? What am I doing wrong? Also is there a more succint way to use my columns and data array's as key,value pairs into my %hash object?
-Thanks in advance,
Niru

To get a reference, you have to use \:
$hoh{$id} = \%hash;
%hash is the hash, not the reference to it. In scalar context, it returns the string X/Y wre X is the number of used buckets and Y the number of all the buckets in the hash (i.e. nothing useful).

To get a reference to a hash variable, you need to use \%hash (as choroba said).
A more succinct way to assign values to columns is to assign to a hash slice, like this:
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
#hash{#columns} = split "\t", $line;
$hoh{$hash{Id}} = \%hash;
}
print "dump: ", Dumper(\%hoh);
A hash slice (#hash{#columns}) means essentially the same thing as ($hash{$columns[0]}, $hash{$columns[1]}, $hash{$columns[2]}, ...) up to however many columns you have. By assigning to it, I'm assigning the first value from split to $hash{$columns[0]}, the second value to $hash{$columns[1]}, and so on. It does exactly the same thing as your while ... $iter loop, just without the explicit loop (and it doesn't extract the $id).
There's no need to compare each $k to 'Id' inside a loop; just store it in the hash as a normal field and extract it afterwards with $hash{Id}. (Aside: Is your column header Id or id? You use Id in your loop, but id in your expected output.)
If you don't want to keep the Id field in the individual entries, you could use delete (which removes the key from the hash and returns the value):
$hoh{delete $hash{Id}} = \%hash;

Take a look at the documentation included in Perl. The command perldoc is very helpful. You can also look at the Perldoc webpage too.
One of the tutorials is a tutorial on Perl references. It all help clarify a lot of your questions and explain about referencing and dereferencing.
I also recommend that you look at CPAN. This is an archive of various Perl modules that can do many various tasks. Look at Text::CSV. This module will do exactly what you want, and even though it says "CSV", it works with tab separated files too.
You missed putting a slash in front of your hash you're trying to make a reference. You have:
$hoh{$id} = %hash;
Probably want:
$hoh{$id} = \%hash;
also, when you do a Data::Dumper of a hash, you should do it on a reference to a hash. Internally, hashes and arrays have similar structures when a Data::Dumper dump is done.
You have:
print "dump: ", Dumper(%hoh);
You should have:
print "dump: ", Dumper( \%hoh );
My attempt at the program:
#! /usr/bin/env perl
#
use warnings;
use strict;
use autodie;
use feature qw(say);
use Data::Dumper;
use constant {
FILE => "test.txt",
};
open my $fh, "<", FILE;
#
# First line with headers
#
my $line = <$fh>;
chomp $line;
my #headers = split /\t/, $line;
my %hash_of_hashes;
#
# Rest of file
#
while ( my $line = <$fh> ) {
chomp $line;
my %line_hash;
my #values = split /\t/, $line;
for my $index ( ( 0..$#values ) ) {
$line_hash{ $headers[$index] } = $values[ $index ];
}
$hash_of_hashes{ $line_hash{id} } = \%line_hash;
}
say Dumper \%hash_of_hashes;

You should only store a reference to a variable if you do so in the last line before the variable goes go of scope. In your script, you declare %hash inside the while loop, so placing this statement as the last in the loop is safe:
$hoh{$id} = \%hash;
If it's not the last statement (or you're not sure it's safe), create an anonymous structure to hold the contents of the variable:
$hoh{$id} = { %hash };
This makes a copy of %hash, which is slower, but any subsequent changes to it will not effect what you stored.

cant get array of hashes in perl

I have the Employees CSV Data and i
try to insert each employee hash in to an array
open($empOutFh,">empOut.txt")
$hash= [];
while(<$empFh>) {
#columnNames = split /,/, $_ if $.==1;
#columnValues = split /,/, $_;
%row = map{$_=>shift #columnValues}#columnNames;
push #$hash,\%row;
}
print Dumper($hash);
I am getting the output has
$VAR1 = [
{
'emp_no' => '11000',
'hire_date
' => '1988-08-20
',
'birth_date' => '1960-09-12',
'gender' => 'M',
'last_name' => 'Bonifati',
'first_name' => 'Alain'
},
$VAR1->[0],
$VAR1->[0],
$VAR1->[0]
]
But when i am try to print each row it showing different row hash for each time

The problem is that you're using a single hash %row, so \%row is always referring to the same hash. Every time you assign to %row, you're not setting it to a new hash, you're just clearing out the same hash and repopulating it (thereby affecting, indirectly, every element of your array).
To fix this, you need to create a new hash in each loop iteration. The minimal change to your code would be to declare %row as a lexical variable with local scope, by using the my operator:
my %row = map { $_ => shift #columnValues } #columnNames;
push #$hash, \%row;
Another option is to eliminate the intermediate variable entirely, and just generate a reference to a new anonymous hash on each pass:
push #$hash, { map { $_ => shift #columnValues } #columnNames };

If you can't get a map to work properly, use a foreach loop instead. Being able to maintain the code is more important than being clever.
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use Data::Dumper;
# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;
# --------------------------------------
# open($empOutFh,">empOut.txt")
my $emp_file = 'empOut.txt';
open my $emp_out_fh, '>', $emp_file or die "could not open $emp_file: $!\n";
# $hash= [];
my #emps = ();
my #columnNames = ();
# while(<$empFh>) {
while( my $line = <$empFh> ){
chomp;
# #columnNames = split /,/, $_ if $.==1;
if( $. == 1 ){
#columnNames = split /,/, $line;
next;
}
# #columnValues = split /,/, $_;
my #columnValues = split /,/, $line;
my %row = ();
# %row = map{$_=>shift #columnValues}#columnNames;
for my $i ( 0 .. $#columnNames ){
$row{$columnNames[$i]} = $columnValues[$i];
}
# push #$hash,\%row;
push #emps, \%row;
# }
}
# print Dumper($hash);
print Dumper \#emps;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

different hash construction processes in Perl - perl

Related

Perl: undefined value as a HASH reference [closed]

How can multiple hash values be retrieved using perl?

How to split a string into multiple hash keys in perl

Build hash of hash in perl

cant get array of hashes in perl

Categories

Resources