Hash of hashes from arrays in Perl - perl

I am using two arrays (one that holds keys and one that holds values) to build a hash in Perl. I need to count and delete the repeated key/value pairs so that I have a unique hash. Then I have to build a hash of hashes of the form: ((key,value), count).
I am using this line to build the hash from the arrays.
#hash{#keys} = #values;
This will keep repeated key/value pairs in the hash which I do not want. Any help in creating the hash of hashes is appreciated. Thanks.

Let's say you arrays look like this:
my #keys = qw/a b c a a/;
my #values = qw/1 2 3 4 1/;
It's as easy as a simple for loop:
for ($i = 0; $i < scalar #keys; $i++) {
$hash{$keys[$i]}{$values[$i]} += 1;
}
You can then iterate over the inner values (the count) with a nested loop:
foreach my $key (keys %hash) {
while (my ($value, $count) = each %{ $hash{$key} } ) {
print "$key : $value = $count\n";
}
}
Which would produce:
c : 3 = 1
a : 4 = 1
a : 1 = 2
b : 2 = 1
You can sort the keys if you want to:
foreach my $key (sort keys %hash) {
# same as before
}
To get
a : 4 = 1
a : 1 = 2
b : 2 = 1
c : 3 = 1

You can use a hash of hashes, where each key hash a subhash of the form value => count.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my #keys = qw( k1 k1 k1 k2 k2 k3 k4 k4 k5);
my #values = qw( v1 v2 v2 v3 v4 v5 v6 v6 v7 );
my %hash;
for my $idx (0 .. $#keys) {
$hash{ $keys[$idx] }{ $values[$idx] }++;
}
print Dumper \%hash;

Related

Hash key value changing to array reference after subroutine in perl

I'm creating keys outside and inside a subroutine on the same hash. However, after the subroutine, the values in the keys I created before the subroutine is called, are now interpreted as array references.
#!/usr/bin/perl
use module;
use strict;
use warnings;
my %hash;
my $count = 0;
my #array = ("a", "b", "c", "d");
for my $letter (#array) {
$hash{$letter} = $count;
$count++;
}
# need "\" to pass in hash otherwise changes
# will get lost outside of subroutine
foreach my $x (sort keys %hash) {
print "first $hash{$x}\n";
}
module::add_ten(\%hash);
foreach my $p (sort keys %hash) {
# $hash{$p} is printing array references, but before it was
# printing the value I desired. What did the subroutine do?
print "second $hash{$p} $hash{$p}->{ten}\n";
}
and here is the module with the subroutine
package module;
sub add_ten {
my $count = 10;
# this passes the full array as reference
my ($hash_ref) = #_; # $hash_ref is actually %hash (yes, the % is not a typo)
my #keys = keys $hash_ref;
foreach my $ltr (sort keys $hash_ref) {
$hash_ref->{$ltr} = { ten => $count };
$count++;
}
}
1;
here is the output:
first 0
first 1
first 2
first 3
second HASH(0x7ff0c3049c50) 10
second HASH(0x7ff0c3049bc0) 11
second HASH(0x7ff0c3049b90) 12
second HASH(0x7ff0c3049b60) 13
I'm expecting the output to be:
first 0
first 1
first 2
first 3
second 0 10
second 1 11
second 2 12
second 3 13
I modified my module:
package module;
sub add_ten {
my $count = 10;
# this passes the full array as reference
my ($hash_ref) = #_; # $hash_ref is actually %hash (yes, the % is not a typo)
my #keys = keys $hash_ref;
foreach my $ltr (sort keys $hash_ref) {
$hash_ref->{$ltr}{ten}=$count;
$count++;
}
}
1;
and the main script (needed to comment out use strict to get it to work):
#!/usr/bin/perl
use module;
#use strict;
use warnings;
my %hash;
my $count = 0;
my #array = ("a", "b", "c", "d");
for my $letter (#array) {
$hash{$letter} = $count;
$count++;
}
# need "\" to pass in hash otherwise changes
# will get lost outside of subroutine
foreach my $x (sort keys %hash) {
print "first $hash{$x}\n";
}
module::add_ten(\%hash);
foreach my $p (sort keys %hash) {
print "second $hash{$p} $hash{$p}{ten}\n";
}
But this is what I was trying to get to.
$hash_ref is a reference to %hash, so when you change the values of the elements of the hash referenced by $hash_ref, you're changing the values of the hash %hash.
That means that when you do
$hash_ref->{$ltr} = { ten => $count };
You are doing
$hash{a} = { ten => 10 };
It should be no surprise that $hash{a} no longer contains zero. You'll have to change your data structure. You could use the following:
$hash{a}{value} = 0;
$hash{a}{subhash}{ten} = 10;

Modify key if it already exists

I'm writing a piece of code that creates a HoAs and loops through for each key. The snippet below shows a basic example of the problem I'm having.
#!/usr/bin/perl
use strict;
use warnings;
my #array1 = qw (1 1 3 4 5); # Note that '1' appears twice
my #array2 = qw (a b c d e);
my #array3 = qw (6 7 8 9 10);
my #array4 = qw (f g h i j);
my %hash;
push #{$hash{$array1[$_]}}, [ $array2[$_], $array3[$_], $array4[$_] ] for 0 .. $#array1;
for my $key (sort keys %hash) {
my ($array2, $array3, $array4) = #{$hash{$key}[-1]};
print "[$key] $array2\t$array3\t$array4\n";
}
Output:
[1] b 7 g
[3] c 8 h
[4] d 9 i
[5] e 10 j
For the data I'm actually using (as opposed to this example) I have been using a key that I've just realised isn't unique, so, as above I end up overriding non-uniqe keys. I'm mainly using these values as keys in order to sort by them later.
My question is either:
A) I can perform the above task for each key unless (exists $hash{$array1}) in which case I can modify it
or
B) Is there a way to sort by those values, in which case I could use another, non-redundant key.
Thanks!
so, as above I end up overriding non-uniqe keys
You aren't. Let's print out the whole contents of that hash:
for my $key (sort { $a <=> $b } keys %hash) { # sort numerically!
for my $array (#{ $hash{$key} }) { # loop over all instead of $hash{$key}[-1] only
say "[$key] " . join "\t", #$array;
}
}
Output:
[1] a 6 f
[1] b 7 g
[3] c 8 h
[4] d 9 i
[5] e 10 j
You would be overriding the values if you were building the hash like
$hash{$array1[$_]} = [ $array2[$_], $array3[$_], $array4[$_] ] for 0 .. $#array1;
(And printing it as)
for my $key ( ... ) {
say "[$key] " . join "\t", #{ $hash{$key} };
}
That is, assigning instead of pushing.
If you want to keep the first value assigned to each key, use the //= operator (assign if undef).

In Perl, how can I sort hash keys using a custom ordering?

I am trying to do work on a hash of files and the work has to be done in a specific order. Most would say the list can be ordered like so:
for my $k (sort keys %my_hash)
{
print "$k=>$my_hash{$k}, ";
}
However, I need nonalphabetical order, in fact the keys start with a word then _ and they go G to digits to L to any of M,P,R,T or D (eg. word_G.txt,word_2.txt,...,word_P.txt). Is there any way to sort by custom order?
Is there any way to sort by custom order?
Yes. See sort.
For example:
#!/usr/bin/env perl
use warnings; use strict;
my #order = qw(G 1 2 3 L M P R T D);
my %order_map = map { $order[$_] => $_ } 0 .. $#order;
my $pat = join '|', #order;
my #input = qw(word_P.txt word_2.txt word_G.txt);
my #sorted = sort {
my ($x, $y) = map /^word_($pat)[.]txt\z/, $a, $b;
$order_map{$x} <=> $order_map{$y}
} #input;
print "#sorted\n";
use 5.014;
sub rank {
my ($word) = #_;
$word =~ s{\A \w+ _}{}msx;
return do {
given ($word) {
0 when /\A G/msx;
1 when /\A [0-9]/msx;
2 when /\A L/msx;
3 when /\A [MPRTD]/msx;
default { 1000 };
}
};
}
say for sort { rank($a) <=> rank($b) } qw(word_P.txt word_2.txt word_G.txt);
Output:
word_G.txt
word_2.txt
word_P.txt
Edit: Before Perl 5.14, use a temporary variable.
use 5.010;
⋮
return do {
my $dummy;
given ($word) {
$dummy = 0 when /\A G/msx;
$dummy = 1 when /\A [0-9]/msx;
$dummy = 2 when /\A L/msx;
$dummy = 3 when /\A [MPRTD]/msx;
default { $dummy = 1000 };
}
$dummy;
};
I had a specific use case where I wanted to sort with certain values first, other values last, then everything else alphabetically in the middle.
Here's my solution:
my #sorted = sort {
my #order = qw(Mike Dave - Tom Joe);
my ($x,$y) = (undef,undef);
for (my $i = 0; $i <= $#order; $i++) {
my $token = $order[$i];
$x = $i if ($token eq $a or (not defined $x and $token eq "-"));
$y = $i if ($token eq $b or (not defined $y and $token eq "-"));
}
$x <=> $y or
$a cmp $b
} #ARGV;
Output:
$ perl customsort.pl Tom Z Mike A Joe X Dave G
Mike Dave A G X Z Tom Joe

Perl - Hash of hash and columns :(

I've a set of strings with variable sizes, for example:
AAA23
AB1D1
A1BC
AAB212
My goal is have in alphabetical order and unique characters collected for COLUMNS, such as:
first column : AAAA
second column : AB1A
and so on...
For this moment I was able to extract the posts through a hash of hashes. But now, how can I sort data? Could I for each hash of hash make a new array?
Thank you very much for you help!
Al
My code:
#!/usr/bin/perl
use strict;
use warnings;
my #sessions = (
"AAAA",
"AAAC",
"ABAB",
"ABAD"
);
my $length_max = 0;
my $length_tmp = 0;
my %columns;
foreach my $string (#sessions){
my $l = length($string);
if ($l > $length_tmp){
$length_max = $l;
}
}
print "max legth : $length_max\n\n";
my $n = 1;
foreach my $string (#sessions){
my #ch = split("",$string);
for my $col (1..$length_max){
$columns{$n}{$col} = $ch[$col-1];
}
$n++;
}
foreach my $col (keys %columns) {
print "colonna : $col\n";
my $deref = $columns{$col};
foreach my $pos (keys %$deref){
print " posizione : $pos --> $$deref{$pos}\n";
}
print "\n";
}
exit(0);
What you're doing is rotating the array. It doesn't need a hash of hash or anything, just another array. Surprisingly, neither List::Util nor List::MoreUtils supplies one. Here's a straightforward implementation with a test. I presumed you want short entries filled in with spaces so the columns come out correct.
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
use List::Util qw(max);
my #Things = qw(
AAA23
AB1D1
A1BC
AAB212
);
sub rotate {
my #rows = #_;
my $maxlength = max map { length $_ } #rows;
my #columns;
for my $row (#rows) {
my #chars = split //, $row;
for my $colnum (1..$maxlength) {
my $idx = $colnum - 1;
$columns[$idx] .= $chars[$idx] || ' ';
}
}
return #columns;
}
sub print_columns {
my #columns = #_;
for my $idx (0..$#columns) {
printf "Column %d: %s\n", $idx + 1, $columns[$idx];
}
}
sub test_rotate {
is_deeply [rotate #_], [
"AAAA",
"AB1A",
"A1BB",
"2DC2",
"31 1",
" 2",
];
}
test_rotate(#Things);
print_columns(#Things);
done_testing;
You can sort the output of %columns in your code with
foreach my $i (sort { $a <=> $b } keys %columns) {
print join(" " => sort values %{ $columns{$i} }), "\n";
}
This gives
A A A A
A A A C
A A B B
A A B D
But using index numbers as hash keys screams that you should use an array instead, so let's do that. To get the columns, use
sub columns {
my #strings = #_;
my #columns;
while (#strings) {
push #columns => [ sort map s/^(.)//s ? $1 : (), #strings ];
#strings = grep length, #strings;
}
#columns;
}
Given the strings from your question, it returns
A A A A
1 A A B
1 A B B
2 2 C D
1 1 3
2
As you can see, this is unsorted and repeats characters. With Perl, when you see the word unique, always think of hashes!
sub unique_sorted_columns {
map { my %unique;
++$unique{$_} for #$_;
[ sort keys %unique ];
}
columns #_;
}
If you don't mind destroying information, you can have columns sort and filter duplicates:
sub columns {
my #strings = #_;
my #columns;
while (#strings) {
my %unique;
map { ++$unique{$1} if s/^(.)//s } #strings;
push #columns => [ sort keys %unique ];
#strings = grep length, #strings;
}
#columns;
}
Output:
A
1 A B
1 A B
2 C D
1 3
2

Difference of Two Arrays Using Perl

I have two arrays. I need to check and see if the elements of one appear in the other one.
Is there a more efficient way to do it than nested loops? I have a few thousand elements in each and need to run the program frequently.
Another way to do it is to use Array::Utils
use Array::Utils qw(:all);
my #a = qw( a b c d );
my #b = qw( c d e f );
# symmetric difference
my #diff = array_diff(#a, #b);
# intersection
my #isect = intersect(#a, #b);
# unique union
my #unique = unique(#a, #b);
# check if arrays contain same members
if ( !array_diff(#a, #b) ) {
# do something
}
# get items from array #a that are not in array #b
my #minus = array_minus( #a, #b );
perlfaq4 to the rescue:
How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array:
#union = #intersection = #difference = ();
%count = ();
foreach $element (#array1, #array2) { $count{$element}++ }
foreach $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
If you properly declare your variables, the code looks more like the following:
my %count;
for my $element (#array1, #array2) { $count{$element}++ }
my ( #union, #intersection, #difference );
for my $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
You need to provide a lot more context. There are more efficient ways of doing that ranging from:
Go outside of Perl and use shell (sort + comm)
map one array into a Perl hash and then loop over the other one checking hash membership. This has linear complexity ("M+N" - basically loop over each array once) as opposed to nested loop which has "M*N" complexity)
Example:
my %second = map {$_=>1} #second;
my #only_in_first = grep { !$second{$_} } #first;
# use a foreach loop with `last` instead of "grep"
# if you only want yes/no answer instead of full list
Use a Perl module that does the last bullet point for you (List::Compare was mentioned in comments)
Do it based on timestamps of when elements were added if the volume is very large and you need to re-compare often. A few thousand elements is not really big enough, but I recently had to diff 100k sized lists.
You can try Arrays::Utils, and it makes it look nice and simple, but it's not doing any powerful magic on the back end. Here's the array_diffs code:
sub array_diff(\#\#) {
my %e = map { $_ => undef } #{$_[1]};
return #{[ ( grep { (exists $e{$_}) ? ( delete $e{$_} ) : ( 1 ) } #{ $_[0] } ), keys %e ] };
}
Since Arrays::Utils isn't a standard module, you need to ask yourself if it's worth the effort to install and maintain this module. Otherwise, it's pretty close to DVK's answer.
There are certain things you must watch out for, and you have to define what you want to do in that particular case. Let's say:
#array1 = qw(1 1 2 2 3 3 4 4 5 5);
#array2 = qw(1 2 3 4 5);
Are these arrays the same? Or, are they different? They have the same values, but there are duplicates in #array1 and not #array2.
What about this?
#array1 = qw( 1 1 2 3 4 5 );
#array2 = qw( 1 1 2 3 4 5 );
I would say that these arrays are the same, but Array::Utils::arrays_diff begs to differ. This is because Array::Utils assumes that there are no duplicate entries.
And, even the Perl FAQ pointed out by mob also says that It assumes that each element is unique in a given array. Is this an assumption you can make?
No matter what, hashes are the answer. It's easy and quick to look up a hash. The problem is what do you want to do with unique values.
Here's a solid solution that assumes duplicates don't matter:
sub array_diff {
my #array1 = #{ shift() };
my #array2 = #{ shift() };
my %array1_hash;
my %array2_hash;
# Create a hash entry for each element in #array1
for my $element ( #array1 ) {
$array1_hash{$element} = #array1;
}
# Same for #array2: This time, use map instead of a loop
map { $array_2{$_} = 1 } #array2;
for my $entry ( #array2 ) {
if ( not $array1_hash{$entry} ) {
return 1; #Entry in #array2 but not #array1: Differ
}
}
if ( keys %array_hash1 != keys %array_hash2 ) {
return 1; #Arrays differ
}
else {
return 0; #Arrays contain the same elements
}
}
If duplicates do matter, you'll need a way to count them. Here's using map not just to create a hash keyed by each element in the array, but also count the duplicates in the array:
my %array1_hash;
my %array2_hash;
map { $array1_hash{$_} += 1 } #array1;
map { $array2_hash{$_} += 2 } #array2;
Now, you can go through each hash and verify that not only do the keys exist, but that their entries match
for my $key ( keys %array1_hash ) {
if ( not exists $array2_hash{$key}
or $array1_hash{$key} != $array2_hash{$key} ) {
return 1; #Arrays differ
}
}
You will only exit the for loop if all of the entries in %array1_hash match their corresponding entries in %array2_hash. Now, you have to show that all of the entries in %array2_hash also match their entries in %array1_hash, and that %array2_hash doesn't have more entries. Fortunately, we can do what we did before:
if ( keys %array2_hash != keys %array1_hash ) {
return 1; #Arrays have a different number of keys: Don't match
}
else {
return; #Arrays have the same keys: They do match
}
You can use this for getting diffrence between two arrays
#!/usr/bin/perl -w
use strict;
my #list1 = (1, 2, 3, 4, 5);
my #list2 = (2, 3, 4);
my %diff;
#diff{ #list1 } = undef;
delete #diff{ #list2 };
You want to compare each element of #x against the element of the same index in #y, right? This will do it.
print "Index: $_ => \#x: $x[$_], \#y: $y[$_]\n"
for grep { $x[$_] != $y[$_] } 0 .. $#x;
...or...
foreach( 0 .. $#x ) {
print "Index: $_ => \#x: $x[$_], \#y: $y[$_]\n" if $x[$_] != $y[$_];
}
Which you choose kind of depends on whether you're more interested in keeping a list of indices to the dissimilar elements, or simply interested in processing the mismatches one by one. The grep version is handy for getting the list of mismatches. (original post)
n + n log n algorithm, if sure that elements are unique in each array (as hash keys)
my %count = ();
foreach my $element (#array1, #array2) {
$count{$element}++;
}
my #difference = grep { $count{$_} == 1 } keys %count;
my #intersect = grep { $count{$_} == 2 } keys %count;
my #union = keys %count;
So if I'm not sure of unity and want to check presence of the elements of array1 inside array2,
my %count = ();
foreach (#array1) {
$count{$_} = 1 ;
};
foreach (#array2) {
$count{$_} = 2 if $count{$_};
};
# N log N
if (grep { $_ == 1 } values %count) {
return 'Some element of array1 does not appears in array2'
} else {
return 'All elements of array1 are in array2'.
}
# N + N log N
my #a = (1,2,3);
my #b=(2,3,1);
print "Equal" if grep { $_ ~~ #b } #a == #b;
Not elegant, but easy to understand:
#!/usr/local/bin/perl
use strict;
my $file1 = shift or die("need file1");
my $file2 = shift or die("need file2");;
my #file1lines = split/\n/,`cat $file1`;
my #file2lines = split/\n/,`cat $file2`;
my %lines;
foreach my $file1line(#file1lines){
$lines{$file1line}+=1;
}
foreach my $file2line(#file2lines){
$lines{$file2line}+=2;
}
while(my($key,$value)=each%lines){
if($value == 1){
print "$key is in only $file1\n";
}elsif($value == 2){
print "$key is in only $file2\n";
}elsif($value == 3){
print "$key is in both $file1 and $file2\n";
}
}
exit;
__END__
Try to use List::Compare. IT has solutions for all the operations that can be performed on arrays.