How do I do a simple Perl hash equivalence comparison? - perl

I'm wondering if there's an idiomatic one-liner or a standard-distribution package/function that I can use to compare two Perl hashes with only builtin, non-blessed types. The hashes are not identical (they don't have equivalent memory addresses).
I'd like to know the answer for both for shallow hashes and hashes with nested collections, but I understand that shallow hashes may have a much simpler solution.
TIA!

Something like cmp_deeply available in Test::Deep ?

[This was a response to an answer by someone who deleted their answer.]
Uh oh!
%a ~~ %b && [sort values %a] ~~ [sort values %b]
doesn't check whether the values belong to the same keys.
#! perl
use warnings;
use strict;
my %a = (eat => "banana", say => "whu whu"); # monkey
my %b = (eat => "whu whu", say => "banana"); # gorilla
print "Magilla Gorilla is always right\n"
if %a ~~ %b && [sort values %a] ~~ [sort values %b];

I don't know if there's an easy way or a built-in package, and I don't know what happens when you just do %hash1 == %hash2 (but that's probably not it), but it's not terribly hard to roll your own:
sub hash_comp (\%\%) {
my %hash1 = %{ shift };
my %hash2 = %{ shift };
foreach (keys %hash1) {
return 1 unless defined $hash2{$_} and $hash1{$_} == $hash2{$_};
delete $hash1{$_};
delete $hash2{$_};
}
return 1 if keys $hash2;
return 0;
}
Untested, but should return 0 if the hashes have all the same elements and all the same values. This function will have to be modified to account for multidimensional hashes.
If you want something from a standard distribution, you could use Data::Dumper; and just dump the two hashes into two scalar variables, then compare the strings for equality. That might work.
There's also a package on CPAN called FreezeThaw that looks like it does what you want.
Note that to use the smart match (not repeated here because it's already posted), you will have to use feature; and it's only available for Perl 5.10. But who's still using Perl 5.8.8, right?

Thanks for your question.
I used Test::More::eq_hash as result.

hashes can be casted into arrays, where every value follows its key (but you won't know the order of the keys). So:
( join("",sort(%hash1)) eq join("",sort(%hash2)) )
Oh, wait, that won't work because there are some edge cases, like:
%hash1 = { 'aaa' => 'aa' };
%hash2 = { 'aa' => 'aaa' };
So it's best to use a character in the join() that you KNOW will never appear in any key or value. If the values are BLOBs, that will be a big problem, but for anything else you could use the NULL char "\0".
( join("\0",sort(%hash1)) eq join("\0",sort(%hash2)) )
Looks kinda ugly, I know, but it will do for checking if two hashes are equal in a shallow way, which is what most people are looking for.

For shallow hashes:
(grep {exists %hash2{$_}} keys %hash1) > 0

You can use eq_deeply in Test::Deep::NoTest. It just returns a boolean that you can check, without the extra overhead of the testing capabilities of the main module.

convert hashes to xml files and compare, and yes you could use multilevel.
sub isEqualHash
{
my ($self,$hash1, $hash2) = #_;
my $file1 = "c:/neo-file1.txt";
my $file2 = "c:/neo-file2.txt";
my $xmlObj = XML::Simple->new();
my $dummy_file = $xmlObj->XMLout($hash1,OutputFile => $file1);
my $dummy_file = $xmlObj->XMLout($hash2,OutputFile => $file2);
open FILE, "<".$file1;
my $file_contents1 = do { local $/; <FILE> };
close(FILE);
open FILE, "<".$file2;
my $file_contents2 = do { local $/; <FILE> };
close(FILE);
if($file_contents1 eq $file_contents2)
{
return "Passed";
}
else
{
return "Failed";
}
}

Related

Perl: safely make hash from list, checking for duplicates

In Perl if you have a list with an even number of elements you can straightforwardly convert it to a hash:
my #a = qw(each peach pear plum);
my %h = #a;
However, if there are duplicate keys then they will be silently accepted, with the last occurrence being the one used. I would like to make a hash checking that there are no duplicates:
my #a = qw(a x a y);
my %h = safe_hash_from_list(#a); # prints error: duplicate key 'a'
Clearly I could write that routine myself:
sub safe_hash_from_list {
die 'even sized list needed' if #_ % 2;
my %r;
while (#_) {
my $k = shift;
my $v = shift;
die "duplicate key '$k'" if exists $r{$k};
$r{$k} = $v;
}
return %r;
}
This, however, is quite a bit slower than the simple assignment. Moreover I do not want to use my own private routine if there is a CPAN module that already does the same job.
Is there a suitable routine on CPAN for safely turning lists into hashes? Ideally one that is a bit faster than the pure-Perl implementation above (though probably never quite as fast as the simple assignment).
If I may be allowed a related follow-up question, I'm also wondering about a tied hash class which allows each key to be assigned only once and dies on reassignment. That would be a more general case of the above problem. Again, I can write such a tied hash myself but I do not want to reinvent the wheel and I would prefer an optimized implementation if one already exists.
Quick way to check that no keys were duplicate would be count the keys and make sure they are equal to half the number of items in the list:
my #a = ...;
my %h = #a;
if (keys %h == (#a / 2)) {
print "Success!";
}

Dereferencing hashes of hashes in Perl

I'm trying to collect the values that I store in a hash of hashes, but I'm kinda confused in how perl does that. So, I create my hash of hashes as follows:
my %hash;
my #items;
#... some code missing here, generally I'm just populating the #items list
#with $currentitem items
while (<FILE>) { #read the file
($a, $b) = split(/\s+/,$_,-1);
$hash{$currentitem} => {$a => $b};
print $hash{$currentitem}{$a} . "\n";#this is a test and it works
}
The above code seems to work. Now, to the point: I have an array #items, which keeps the $currentitem values. And I want to do something like this:
#test = keys %hash{ $items[$num] };
So that I can get all the key/value pairs for a specific item. I've tried the line of code above, as well as
while ( ($key, $value) = each( $hash{$items[$num]} ) ) {
print "$key, $value\n";
}
I've even tried to populate the hash as follows:
$host{ "$currentcustomer" }{"$a"} => "$b";
Which seems to be more correct according to the various online sources I've met. But still, I can't access the data inside that hash... Any ideas?
I am confused by you saying that this works:
$hash{$currentitem} => {$a => $b};
That shouldn't work (and doesn't work for me). The => operator is a special kind of comma, not an assignment (see perlop). In addition, the construct on the right makes a new anonymous hash. Using that, a new anonymous hash would overwrite the old one for each element you tried to add. You would only ever have one element for each $currentitem.
Here is what you want for assignment:
$hash{$currentitem}{$a} = $b;
And here is how to get the keys:
keys %{ $hash{ $items[$num] } };
I suggest reading up on Perl references to get a better handle on this. The syntax can be a bit tricky at first.
Long answer is in perldoc perldsc.
Short answer is:
keys %{ $expr_producing_hash_ref };
In your case I believe it's
keys %{ $hash{$items[$num]} };

How can I determine if an element exists in an array (perl)

I'm looping through an array, and I want to test if an element is found in another array.
In pseudo-code, what I'm trying to do is this:
foreach $term (#array1) {
if ($term is found in #array2) {
#do something here
}
}
I've got the "foreach" and the "do something here" parts down-pat ... but everything I've tried for the "if term is found in array" test does NOT work ...
I've tried grep:
if grep {/$term/} #array2 { #do something }
# this test always succeeds for values of $term that ARE NOT in #array2
if (grep(/$term/, #array2)) { #do something }
# this test likewise succeeds for values NOT IN the array
I've tried a couple different flavors of "converting the array to a hash" which many previous posts have indicated are so simple and easy ... and none of them have worked.
I am a long-time low-level user of perl, I understand just the basics of perl, do not understand all the fancy obfuscated code that comprises 99% of the solutions I read on the interwebs ... I would really, truly, honestly appreciate any answers that are explicit in the code and provide a step-by-step explanation of what the code is doing ...
... I seriously don't grok $_ and any other kind or type of hidden, understood, or implied value, variable, or function. I would really appreciate it if any examples or samples have all variables and functions named with clear terms ($term as opposed to $_) ... and describe with comments what the code is doing so I, in all my mentally deficient glory, may hope to possibly understand it some day. Please. :-)
...
I have an existing script which uses 'grep' somewhat succesfully:
$rc=grep(/$term/, #array);
if ($rc eq 0) { #something happens here }
but I applied that EXACT same code to my new script and it simply does NOT succeed properly ... i.e., it "succeeds" (rc = zero) when it tests a value of $term that I know is NOT present in the array being tested. I just don't get it.
The ONLY difference in my 'grep' approach between 'old' script and 'new' script is how I built the array ... in old script, I built array by reading in from a file:
#array=`cat file`;
whereas in new script I put the array inside the script itself (coz it's small) ... like this:
#array=("element1","element2","element3","element4");
How can that result in different output of the grep function? They're both bog-standard arrays! I don't get it!!!! :-(
########################################################################
addendum ... some clarifications or examples of my actual code:
########################################################################
The term I'm trying to match/find/grep is a word element, for example "word123".
This exercise was just intended to be a quick-n-dirty script to find some important info from a file full of junk, so I skip all the niceties (use strict, warnings, modules, subroutines) by choice ... this doesn't have to be elegant, just simple.
The term I'm searching for is stored in a variable which is instantiated via split:
foreach $line(#array1) {
chomp($line); # habit
# every line has multiple elements that I want to capture
($term1,$term2,$term3,$term4)=split(/\t/,$line);
# if a particular one of those terms is found in my other array 'array2'
if (grep(/$term2/, #array2) {
# then I'm storing a different element from the line into a 3rd array which eventually will be outputted
push(#known, $term1) unless $seen{$term1}++;
}
}
see that grep up there? It ain't workin right ... it is succeeding for all values of $term2 even if it is definitely NOT in array2 ... array1 is a file of a couple thousand lines. The element I'm calling $term2 here is a discrete term that may be in multiple lines, but is never repeated (or part of a larger string) within any given line. Array2 is about a couple dozen elements that I need to "filter in" for my output.
...
I just tried one of the below suggestions:
if (grep $_ eq $term2, #array2)
And this grep failed for all values of $term2 ... I'm getting an all or nothing response from grep ... so I guess I need to stop using grep. Try one of those hash solutions ... but I really could use more explanation and clarification on those.
This is in perlfaq. A quick way to do it is
my %seen;
$seen{$_}++ for #array1;
for my $item (#array2) {
if ($seen{$item}) {
# item is in array2, do something
}
}
If letter case is not important, you can set the keys with $seen{ lc($_) } and check with if ($seen{ lc($item) }).
ETA:
With the changed question: If the task is to match single words in #array2 against whole lines in #array1, the task is more complicated. Trying to split the lines and match against hash keys will likely be unsafe, because of punctuation and other such things. So, a regex solution will likely be the safest.
Unless #array2 is very large, you might do something like this:
my $rx = join "|", #array2;
for my $line (#array1) {
if ($line =~ /\b$rx\b/) { # use word boundary to avoid partial matches
# do something
}
}
If #array2 contains meta characters, such as *?+|, you have to make sure they are escaped, in which case you'd do something like:
my $rx = join "|", map quotemeta, #array2;
# etc
You could use the (infamous) "smart match" operator, provided you are on 5.10 or later:
#!/usr/bin/perl
use strict;
use warnings;
my #array1 = qw/a b c d e f g h/;
my #array2 = qw/a c e g z/;
print "a in \#array1\n" if 'a' ~~ #array1;
print "z in \#array1\n" if 'z' ~~ #array1;
print "z in \#array2\n" if 'z' ~~ #array2;
The example is very simple, but you can use an RE if you need to as well.
I should add that not everyone likes ~~ because there are some ambiguities and, um, "undocumented features". Should be OK for this though.
This should work.
#!/usr/bin/perl
use strict;
use warnings;
my #array1 = qw/a b c d e f g h/;
my #array2 = qw/a c e g z/;
for my $term (#array1) {
if (grep $_ eq $term, #array2) {
print "$term found.\n";
}
}
Output:
a found.
c found.
e found.
g found.
#!/usr/bin/perl
#ar = ( '1','2','3','4','5','6','10' );
#arr = ( '1','2','3','4','5','6','7','8','9' ) ;
foreach $var ( #arr ){
print "$var not found\n " if ( ! ( grep /$var/, #ar )) ;
}
Pattern matching is the most efficient way of matching elements. This would do the trick. Cheers!
print "$element found in the array\n" if ("#array" =~ m/$element/);
Your 'actual code' shouldn't even compile:
if (grep(/$term2/, #array2) {
should be:
if (grep (/$term2/, #array2)) {
You have unbalanced parentheses in your code. You may also find it easier to use grep with a callback (code reference) that operates on its arguments (the array.) It helps keep the parenthesis from blurring together. This is optional, though. It would be:
if (grep {/$term2/} #array2) {
You may want to use strict; and use warnings; to catch issues like this.
The example below might be helpful, it tries to see if any element in #array_sp is present in #my_array:
#! /usr/bin/perl -w
#my_array = qw(20001 20003);
#array_sp = qw(20001 20002 20004);
print "#array_sp\n";
foreach $case(#my_array){
if("#array_sp" =~ m/$case/){
print "My God!\n";
}
}
use pattern matching can solve this. Hope it helps
-QC
1. grep with eq , then
if (grep {$_ eq $term2} #array2) {
print "$term2 exists in the array";
}
2. grep with regex , then
if (grep {/$term2/} #array2) {
print "element with pattern $term2 exists in the array";
}

How to test if a value exist in a hash?

Let's say I have this
#!/usr/bin/perl
%x = ('a' => 1, 'b' => 2, 'c' => 3);
and I would like to know if the value 2 is a hash value in %x.
How is that done?
Fundamentally, a hash is a data structure optimized for solving the converse question, knowing whether the key 2 is present. But it's hard to judge without knowing, so let's assume that won't change.
Possibilities presented here will depend on:
how often you need to do it
how dynamic the hash is
One-time op
grep $_==2, values %x (also spelled grep {$_==1} values %x) will return a list of as many 2s as are present in the hash, or, in scalar context, the number of matches. Evaluated as a boolean in a condition, it yields just what you want.
grep works on versions of Perl as old as I can remember.
use List::Util qw(first); first {$_==2} values %x returns only the first match, undef if none. That makes it faster, as it will short-circuit (stop examining elements) as soon as it succeeds. This isn't a problem for 2, but take care that the returned element doesn't necessarily evaluate to boolean true. Use defined in those cases.
List::Util is a part of the Perl core since 5.8.
use List::MoreUtils qw(any); any {$_==2} values %x returns exactly the information you requested as a boolean, and exhibits the short-circuiting behavior.
List::MoreUtils is available from CPAN.
2 ~~ [values %x] returns exactly the information you requested as a boolean, and exhibits the short-circuiting behavior.
Smart matching is available in Perl since 5.10.
Repeated op, static hash
Construct a hash that maps values to keys, and use that one as a natural hash to test key existence.
my %r = reverse %x;
if ( exists $r{2} ) { ... }
Repeated op, dynamic hash
Use a reverse lookup as above. You'll need to keep it up to date, which is left as an exercise to the reader/editor. (hint: value collisions are tricky)
Shorter answer using smart match (Perl version 5.10 or later):
print 2 ~~ [values %x];
my %reverse = reverse %x;
if( defined( $reverse{2} ) ) {
print "2 is a value in the hash!\n";
}
If you want to find out the keys for which the value is 2:
foreach my $key ( keys %x ) {
print "2 is the value for $key\n" if $x{$key} == 2;
}
Everyone's answer so far was not performance-driven. While the smart-match (~~) solution short circuits (e.g. stops searching when something is found), the grep ones do not.
Therefore, here's a solution which may have better performance for Perl before 5.10 that doesn't have smart match operator:
use List::MoreUtils qw(any);
if (any { $_ == 2 } values %x) {
print "Found!\n";
}
Please note that this is just a specific example of searching in a list (values %x) in this case and as such, if you care about performance, the standard performance analysis of searching in a list apply as discussed in detail in this answer
grep and values
my %x = ('a' => 1, 'b' => 2, 'c' => 3);
if (grep { $_ == 2 } values %x ) {
print "2 is in hash\n";
}
else {
print "2 is not in hash\n";
}
See also: perldoc -q hash
Where $count would be the result:
my $count = grep { $_ == 2 } values %x;
This will not only show you if it's a value in the hash, but how many times it occurs as a value. Alternatively you can do it like this as well:
my $count = grep {/2/} values %x;

Converting code to perl sub, but not sure I'm doing it right

I'm working from a question I posted earlier (here), and trying to convert the answer to a sub so I can use it multiple times. Not sure that it's done right though. Can anyone provide a better or cleaner sub?
I have a good deal of experience programming, but my primary language is PHP. It's frustrating to know how to execute in one language, but not be able to do it in another.
sub search_for_key
{
my ($args) = #_;
foreach $row(#{$args->{search_ary}}){
print "#$row[0] : #$row[1]\n";
}
my $thiskey = NULL;
my #result = map { $args->{search_ary}[$_][0] } # Get the 0th column...
grep { #$args->{search_in} =~ /$args->{search_ary}[$_][1]/ } # ... of rows where the
0 .. $#array; # first row matches
$thiskey = #result;
print "\nReturning: " . $thiskey . "\n";
return $thiskey;
}
search_for_key({
'search_ary' => $ref_cam_make,
'search_in' => 'Canon EOS Rebel XSi'
});
---Edit---
From the answers so far, I've cobbled together the function below. I'm new to Perl, so I don't really understand much of the syntax. All I know is that it throws an error (Not an ARRAY reference at line 26.) about that grep line.
Since I seem to not have given enough info, I will also mention that:
I am calling this function like this (which may or may not be correct):
search_for_key({
'search_ary' => $ref_cam_make,
'search_in' => 'Canon EOS Rebel XSi'
});
And $ref_cam_make is an array I collect from a database table like this:
$ref_cam_make = $sth->fetchall_arrayref;
And it is in the structure like this (if I understood how to make the associative fetch work properly, I would like to use it like that instead of by numeric keys):
Reference Array
Associative
row[1][cam_make_id]: 13, row[1][name]: Sony
Numeric
row[1][0]: 13, row[1][1]: Sony
row[0][0]: 19, row[0][1]: Canon
row[2][0]: 25, row[2][1]: HP
sub search_for_key
{
my ($args) = #_;
foreach my $row(#{$args->{search_ary}}){
print "#$row[0] : #$row[1]\n";
}
print grep { $args->{search_in} =~ #$args->{search_ary}[$_][1] } #$args->{search_ary};
}
You are moving in the direction of a 2D array, where the [0] element is some sort of ID number and the [1] element is the camera make. Although reasonable in a quick-and-dirty way, such approaches quickly lead to unreadable code. Your project will be easier to maintain and evolve if you work with richer, more declarative data structures.
The example below uses hash references to represent the camera brands. An even nicer approach is to use objects. When you're ready to take that step, look into Moose.
use strict;
use warnings;
demo_search_feature();
sub demo_search_feature {
my #camera_brands = (
{ make => 'Canon', id => 19 },
{ make => 'Sony', id => 13 },
{ make => 'HP', id => 25 },
);
my #test_searches = (
"Sony's Cyber-shot DSC-S600",
"Canon cameras",
"Sony HPX-32",
);
for my $ts (#test_searches){
print $ts, "\n";
my #hits = find_hits($ts, \#camera_brands);
print ' => ', cb_stringify($_), "\n" for #hits;
}
}
sub cb_stringify {
my $cb = shift;
return sprintf 'id=%d make=%s', $cb->{id}, $cb->{make};
}
sub find_hits {
my ($search, $camera_brands) = #_;
return grep { $search =~ $_->{make} } #$camera_brands;
}
This whole sub is really confusing, and I'm a fairly regular perl user. Here are some blanket suggestions.
Do not create your own undef ever -- use undef then return at the bottom return $var // 'NULL'.
Do not ever do this: foreach $row, because foreach my $row is less prone to create problems. Localizing variables is good.
Do not needlessly concatenate, for it offends the style god: not this, print "\nReturning: " . $thiskey . "\n";, but print "\nReturning: $thiskey\n";, or if you don't need the first \n: say "Returning: $thiskey;" (5.10 only)
greping over 0 .. $#array; is categorically lame, just grep over the array: grep {} #{$foo[0]}, and with that code being so complex you almost certainly don't want grep (though I don't understand what you're doing to be honest.). Check out perldoc -q first -- in short grep doesn't stop until the end.
Lastly, do not assign an array to a scalar: $thiskey = #result; is an implicit $thiskey = scalar #result; (see perldoc -q scalar) for more info. What you probably want is to return the array reference. Something like this (which eliminates $thiskey)
printf "\nReturning: %s\n", join ', ', #result;
#result ? \#result : 'NULL';
If you're intending to return whether a match is found, this code should work (inefficiently). If you're intending to return the key, though, it won't -- the scalar value of #result (which is what you're getting when you say $thiskey = #result;) is the number of items in the list, not the first entry.
$thiskey = #result; should probably be changed to $thiskey = $result[0];, if you want mostly-equivalent functionality to the code you based this off of. Note that it won't account for multiple matches anymore, though, unless you return #result in its entirety, which kinda makes more sense anyway.