compare multiple hashes for common keys merge values - perl

I have a working bit of code here where I am comparing the keys of six hashes together to find the ones that are common amongst all of them. I then combine the values from each hash into one value in a new hash. What I would like to do is make this scaleable. I would like to be able to easily go from comparing 3 hashes to 100 without having to go back into my code and altering it. Any thoughts on how I would achieve this? The rest of the code already works well for different input amounts, but this is the one part that has me stuck.
my $comparison = List::Compare->new([keys %{$posHashes[0]}], [keys %{$posHashes[1]}], [keys %{$posHashes[2]}], [keys %{$posHashes[3]}], [keys %{$posHashes[4]}], [keys %{$posHashes[5]}]);
my %comboHash;
for ($comparison->get_intersection) {
$comboHash{$_} = ($posHashes[0]{$_} . $posHashes[1]{$_} . $posHashes[2]{$_} . $posHashes[3]{$_} . $posHashes[4]{$_} . $posHashes[5]{$_});
}

my %all;
for my $posHash (#posHashes) {
for my $key (keys(%$posHash)) {
push #{ $all{$key} }, $posHash->{$key};
}
}
my %comboHash;
for my $key (keys(%all)) {
next if #{ $all{$key} } != #posHashes;
$comboHash{$key} = join('', #{ $all{$key} });
}

Just make a subroutine and pass it hash references
my $combination = combine(#posHashes);
sub combine {
my #hashes = #_;
my #keys;
for my $href (#hashes) {
push #keys, keys %$href;
}
# Insert intersection code here..
# .....
my %combo;
for my $href (#hashes) {
for my $key (#intersection) {
$combo{$key} .= $href->{$key};
}
}
return \%combo;
}

Create a subroutine:
sub combine_hashes {
my %result = ();
my #hashes = #_;
my $first = shift #hashes;
for my $element (keys %$first) {
my $count = 0;
for my $href (#hashes) {
$count += (grep {$_ eq $element} (keys %$href));
}
if ($count > $#hashes) {
$result{$element} = $first->{$element};
$result{$element} .= $_->{$element} for #hashes;
}
}
\%result;
}
and call it by:
my %h = %{combine_hashes(\%h1, \%h2, \%h3)};
...or as:
my %h = %{combine_hashes(#posHashes)};

There is pretty straightforward solution:
sub merge {
my $first = shift;
my #hashes = #_;
my %result;
KEY:
for my $key (keys %$first) {
my $accu = $first->{$key};
for my $hash (#hashes) {
next KEY unless exists $hash->{$key};
$accu .= $hash->{$key};
}
$result{$key} = $accu;
}
return \%result;
}
You have to call it with references to hashes and it will return also hash reference e.g.:
my $comboHashRef = merge(#posHashes);

Related

Split string into a hash of hashes (perl)

at the moment im a little confused..
I am looking for a way to write a string with an indefinite number of words (separated by a slash) in a recursive hash.
These "strings" are output from a text database.
Given is for example
"office/1/hardware/mouse/count/200"
the next one can be longer or shorter..
This must be created from it:
{
office {
1{
hardware {
mouse {
count => 200
}
}
}
}
}
Any idea ?
Work backwards. Split the string. Use the last two elements to make the inner-most hash. While more words exist, make each one the key of a new hash, with the inner hash as its value.
my $s = "office/1/hardware/mouse/count/200";
my #word = split(/\//, $s);
# Bottom level taken explicitly
my $val = pop #word;
my $key = pop #word;
my $h = { $key => $val };
while ( my $key = pop #word )
{
$h = { $key => $h };
}
Simple recursive function should do
use strict;
use warnings;
use Data::Dumper;
sub foo {
my $str = shift;
my ($key, $rest) = split m|/|, $str, 2;
if (defined $rest) {
return { $key => foo($rest) };
} else {
return $key;
}
}
my $hash = foo("foo/bar/baz/2");
print Dumper $hash;
Gives output
$VAR1 = {
'foo' => {
'bar' => {
'baz' => '2'
}
}
};
But like I said in the comment: What do you intend to use this for? It is not a terribly useful structure.
If there are many lines to be read into a single hash and the lines have a variable number of fields, you have big problems and the other two answers will clobber data by either smashing sibling keys or overwriting final values. I'm supposing this because there is no rational reason to convert a single line into a hash.
You will have to walk down the hash with each field. This will also give you the most control over the process.
our $hash = {};
our $eolmark = "\000";
while (my $line = <...>) {
chomp $line;
my #fields = split /\//, $line;
my $count = #fields;
my $h = $hash;
my $i = 0;
map { (++$i == $count) ?
($h->{$_}{$eolmark} = 1) :
($h = $h->{$_} ||= {});
} #fields;
}
$h->{$_}{$eolmark} = 1 You need the special "end of line" key so that you can recognize the end of a record and still permit longer records to coexist. If you had two records
foo/bar/baz foo/bar/baz/quux, the second would overwrite the final value of the first.
$h = $h->{$_} ||= {} This statement is a very handy idiom to both create and populate a cache in one step and then take a shortcut reference to it. Never do a hash lookup more than once.
HTH

Searching first level keys in a multidimensional hash Perl

I have the value of a multi-dimensional hash in Perl.
Its structure is
$hash{$key}{$field}{$date} = $value;
Given that I have both the correct value and field.
What is the fastest process possible to search for its key given that the value itself is unique (has a 1-1 relationship to the key)
EDIT:
I have added a third level which is date.
not all dates have a value but when it does, it is shared through all the dates.
To simplify it, if it has a value, it is "A", else, blank.
Regards,
InnZaayynn
The organization of your data is not suited to do a fast search. You're going to have to iterate through the entire hash. If you're going to perform multiple searches, it's best if you generate the "inverse" hash so you only need to iterate through the entire hash once instead of once per search.
If you are performing multiple searches, and they're not all for the same field, generate the inverse hash as follows:
my %key_by_field_and_value;
for my $key (keys(%hash)) {
my $hash_for_key = $hash{$key};
for my $field (keys(%$hash_for_key)) {
my $hash_for_key_and_field = $hash_for_key->{$field};
defined( my $date = get_any_one_key($hash_for_key_and_field) )
or next;
length( my $value = $hash_for_key_and_field->{$date} )
or next;
$key_by_field_and_value{$field}{$value} = $key;
}
}
Then, a search becomes
my $field = ...;
my $target_value = ...;
if (defined(
my $target_key =
do { no autovivification; $key_by_field_and_value{$field}{$target_value} }
)) {
...
}
If you are performing multiple searches, and they're all for the same field, generate the inverse hash as follows:
my $field = ...;
my %key_by_value;
for my $key (keys(%hash)) {
my $hash_for_key = $hash{$key};
defined( my $hash_for_key_and_field = $hash_for_key->{$field} )
or next;
defined( my $date = get_any_one_key($hash_for_key_and_field) )
or next;
length( my $value = $hash_for_key_and_field->{$date} )
or next;
$key_by_value{$value} = $key;
}
Then, a search becomes
my $target_value = ...;
if (defined( my $target_key = $key_by_value{$target_value} )) {
...
}
If you're just going to search once, you'll have to search the entire hash.
my $field = ...;
my $target_value = ...;
my $target_key;
for my $key (keys(%hash)) {
my $hash_for_key = $hash{$key};
defined( my $hash_for_key_and_field = $hash_for_key->{$field} )
or next;
defined( my $date = get_any_one_key($hash_for_key_and_field) )
or next;
length( my $value = $hash_for_key_and_field->{$date} )
or next;
if ($value eq $target_value) {
$target_key = $key;
last;
}
}
if (defined($target_key)) {
...
}
Both of the above solutions use this efficient version of my ($key) = keys(%$h);:
sub get_any_one_key {
my ($h) = #_;
my $key = each(%$h);
keys(%$h); # Reset iterator
return $key;
}

Perl: multidimensional hash

suppose I have the following data
cluster1:d(A),f(C)s,(A)
cluster2:r(D),h(D),f(A)
I want this out put
output:
cluster1:A->2
cluster1:C->1
cluster2:D->2
cluster2:A->1
here is my try,but it is not correct , the part that I am trying to count characters has a problem that I cant fix
the code is a part of very big code ,and I want exactly multidimensional hash
use strict;
use Data::Dumper;
my %count;
while (<DATA>) {
my %HoH;
my ( $cluster, $ch ) = split (/:/,$_);
$HoH{$cluster}={split /[()]+/,$ch};
for my $clust ( keys %HoH ) {
for my $character ( keys %{ $HoH{$clust} } ) {
$count{$clust}{$HoH{$clust}{$character}}++;
}
}
}
print Dumper(\%count);
foreach my $name (sort keys %count) {
foreach my $subject (keys %{$count{$name}}) {
print "$name:$subject->$count{$name}{$subject}\n";
}
}
DATA
cluster1:d(A),f(C)s,(A)
cluster2:r(D),h(D),f(A)
It would be nice if you try to understand the below code so that you can get an idea for solving the problem:-
use strict;
use Data::Dumper;
my $data = "cluster1:A,B,C,A";
my %cluster = ();
my ($cluster_key, $cluster_val ) = split (':', $data);
my #cluster1_data = split(',', $cluster_val);
foreach my $val ( #cluster1_data ) {
$cluster{$cluster_key}{$val}++;
}
print Dumper(\%cluster);
foreach my $clus ( keys %cluster ) {
my $clus_ref = $cluster{$clus};
foreach my $clu ( keys %{ $clus_ref } ){
my $count = $clus_ref->{$clu};
print"$clus:$clu->$count\n";
}
}
Output:
$VAR1 = {
'cluster1' => {
'A' => 2,
'C' => 1,
'B' => 1
}
};
cluster1:A->2
cluster1:C->1
cluster1:B->1
What do you expect $count{$cluster}{$characters}+=1; to do exactly? You have to loop over your input data to populate %count if you expect to get the desired result:
while (<DATA>) {
next unless /^(cluster\d+):(.+)/;
$count{$1}{$_}++ for split/,/, $2;
}
If you also add sort to the second foreach you'll get the output you want.
EDIT: This solves the question for the updated input and requirements:
my %count;
while (<DATA>) {
next unless /^(cluster\d+):(.+)/;
my $cluster = $1;
$count{$cluster}{$_}++ for $2 =~ /\((\w)\)/g;
}
for my $key (sort keys %count) {
for my $value (sort {
$count{$key}{$b} <=> $count{$key}{$a}
} keys %{$count{$key}}) {
print "$key:$value->$count{$key}{$value}\n";
}
}

Optimize perl hash mess

I have Perl code, which looks messy:
my $x = $h->[1];
foreach my $y (keys %$x) {
my $ax = $x->{$y};
foreach my $ay (keys %$ax) {
if (ref($ax->{$ay}) eq 'JE::Object::Proxy') {
my $bx = $ax->{$ay};
if ($$bx->{class_info}->{name} eq 'HTMLImageElement') {
print $$bx->{value}->{src}, "\n";
}
}
}
}
Is it possible to optimize the code above to not use any variables, just $h, as that one is an input?
Here's my crack at it:
print $$_->{value}{src}, "\n" for grep {
ref $_ eq 'JE::Object::Proxy' &&
$$_->{class_info}{name} eq 'HTMLImageElement'
} map {
values %$_
} values %{ $h->[1] };
You're using keys, when you really just want values.
foreach my $h ( grep { ref() eq 'HASH' } values %$x ) {
foreach my $obj (
grep { ref() eq 'JE::Object::Proxy'
and $_->{class_info}{name} eq 'HTMLImageElement'
} values %$h
) {
say $obj->{value}{src};
}
}
A lot of the "messiness" can be cleaned up by reducing your line count and minimizing how much nested code you have. Use the each command to get the next key and its associated value from the hash in one line. [EDIT: as Axeman pointed out, you really only need the values, so I'm replacing my use of each with values]. Also, use a pair of next statement to skip the print statement.
for my $ax (values %{$h->[1]} ) {
for my $bx (values %$ax ) {
next unless ref($bx) eq 'JE::Object::Proxy';
next unless $$bx->{class_info}->{name} eq 'HTMLImageElement';
print "$$bx->{value}->{src}\n";
}
}
Just removing the helper variables is easy, something like this should do it:
foreach my $y (keys %{$h->[1]}) {
foreach my $ax (%{$h->[1]->{$y}) {
foreach my $ay (keys %$ax) {
if(ref($h->[1]->{$y}->{$ay}) eq 'JE::Object::Proxy') {
if($h->[1]->{$y}->{$ay}->{class_info}->{name} eq 'HTMLImageElement') {
print $h->[1]->{$y}->{$ay}->{value}->{src}, "\n";
}
}
}
}
}
You could also remove the duplicated if:
foreach my $y (keys %{$h->[1]}) {
foreach my $ax (%{$h->[1]->{$y}) {
foreach my $ay (keys %$ax) {
if(ref($h->[1]->{$y}->{$ay}) eq 'JE::Object::Proxy' && $h->[1]->{$y}->{$ay}->{class_info}->{name} eq 'HTMLImageElement') {
print $h->[1]->{$y}->{$ay}->{value}->{src}, "\n";
}
}
}
}
But I don't really see how to make it more readable: it is a iteration over a three dimensional structure.

How do you access information in a hash reference that has been passed to a sub-routine?

I am trying to use hash references to pass information to sub-routines. Psuedo code:
sub output_detail {
Here I want to be able to access each record by the key name (ex. "first", "second", etc)
}
sub output_records {
I want to use a foreach to pass each record has reference to another sub-routine
that handles each record.
foreach $key ( sort( keys %someting) ) {
output_detail(something);
}
}
%records = ();
while ($recnum, $first, $second, $third) = db_read($handle)) {
my %rec = ("first"=>$first, "second"=>$second, "third=>$third);
my $id = $recnum;
$records{$id} = \%rec;
}
output_records(\%records);
I'm not sure how to de-reference the hashes when passed to a sub-routine.
Any ideas would be very helpful.
Thanks
Use -> to access keys of a hash ref. So, your argument to output_records will come through as a scalar hash ref.
sub output_records {
my $records = shift;
my $first = $records->{"first"};
}
See perlreftut for more info.
sub output_detail {
my $hash = shift;
my $value = $$hash{some_key};
}
sub output_records {
my $hash = shift;
foreach my $key (sort keys %$hash) {
output_detail($hash, $key);
# or just pass `$$hash{$key}` if you only need the value
}
}