Inverting a Hash's Key and Values in Perl - perl

I would like to make the value the key, and the key the value. What is the best way to go about doing this?

Adapted from http://www.dreamincode.net/forums/topic/46400-swap-hash-values/:
Assuming your hash is stored in $hash:
while (($key, $value) = each %hash) {
$hash2{$value}=$key;
}
%hash=%hash2;
Seems like much more elegant solution can be achieved with reverse (http://www.misc-perl-info.com/perl-hashes.html#reverseph):
%nhash = reverse %hash;
Note that with reverse, duplicate values will be overwritten.

Use reverse:
use Data::Dumper;
my %hash = ('month', 'may', 'year', '2011');
print Dumper \%hash;
%hash = reverse %hash;
print Dumper \%hash;

As mentioned, the simplest is
my %inverse = reverse %original;
It "fails" if multiple elements have the same value. You could create an HoA to handle that situation.
my %inverse;
push #{ $inverse{ $original{$_} } }, $_ for keys %original;

So you want reverse keys & vals in a hash? So use reverse... ;)
%hash2 = reverse %hash;
reverting (k1 => v1, k2 => v2) - yield (v2=>k2, v1=>k1) - and that is what you want. ;)

my %orig_hash = (...);
my %new_hash;
%new_hash = map { $orig_hash{$_} => $_ } keys(%orig_hash);

The map-over-keys solution is more flexible. What if your value is not a simple value?
my %forward;
my %reverse;
#forward is built such that each key maps to a value that is a hash ref:
#{ a => 'something', b=> 'something else'}
%reverse = map { join(',', #{$_}{qw(a b)}) => $_ } keys %forward;

Here is a way to do it using Hash::MultiValue.
use experimental qw(postderef);
sub invert {
use Hash::MultiValue;
my $mvh = Hash::MultiValue->from_mixed(shift);
my $inverted;
$mvh->each( sub { push $inverted->{ $_[1] }->#* , $_[0] } ) ;
return $inverted;
}
To test this we can try the following:
my %test_hash = (
q => [qw/1 2 3 4/],
w => [qw/4 6 5 7/],
e => ["8"],
r => ["9"],
t => ["10"],
y => ["11"],
);
my $wow = invert(\%test_hash);
my $wow2 = invert($wow);
use DDP;
print "\n \%test_hash:\n\n" ;
p %test_hash;
print "\n \%test_hash inverted as:\n\n" ;
p $wow ;
# We need to sort the contents of the multi-value array reference
# for the is_deeply() comparison:
map {
$test_hash{$_} = [ sort { $a cmp $b || $a <=> $b } #{ $test_hash{$_} } ]
} keys %test_hash ;
map {
$wow2->{$_} = [ sort { $a cmp $b || $a <=> $b } #{ $wow2->{$_} } ]
} keys %$wow2 ;
use Test::More ;
is_deeply(\%test_hash, $wow2, "double inverted hash == original");
done_testing;
Addendum
Note that in order to pass the gimmicky test here, the invert() function relies on %test_hash having array references as values. To work around this if your hash values are not array references, you can "coerce" the regular/mixed hash into a multi-value hash thatHash::MultiValue can then bless into an object. However, this approach means even single values will appear as array references:
for ( keys %test_hash ) {
if ( ref $test_hash{$_} ne 'ARRAY' ) {
$test_hash{$_} = [ $test_hash{$_} ]
}
}
which is longhand for:
ref($_) or $_ = [ $_ ] for values %test_hash ;
This would only be needed to get the "round trip" test to pass.

Assuming all your values are simple and unique strings, here is one more easy way to do it.
%hash = ( ... );
#newhash{values %hash} = (keys %hash);
This is called a hash slice. Since you're using %newhash to produce a list of keys, you change the % to a #.
Unlike the reverse() method, this will insert the new keys and values in the same order as they were in the original hash. keys and values always return their values in the same order (as does each).
If you need more control over it, like sorting it so that duplicate values get the desired key, use two hash slices.
%hash = ( ... );
#newhash{ #hash{sort keys %hash} } = (sort keys %hash);

Related

How can I sort an array of hashes by the hashes name?

I want to sort an array of hashes by the hashes key, how could I do that in Perl?
The structure is created like this :
push (#{$structure[$endpoint][1]}, \%temp_hash);
%temp_hash is a simple hash with key->value.
And now I want to sort that array by the hashes key, there is only one key->value in each hash... been fighting with it for 2 hours already and I gave up..
Try
#sorted = sort { (keys %$a)[0] cmp (keys %$b)[0] } #{$structure[$endpoint][1]};
This sorts the elements of the array (which are hash references) according to the first (only) key of each hash. If the keys are numeric use <=> instead.
Test code:
%a = ( 'a' => 1 );
%b = ( 'zz' => 2 );
%c = ( 'g' => 3);
#arr = (\%a, \%b, \%c);
print "Unsorted\n";
for (#arr)
{
printf "%s\n",((keys %$_)[0]);
}
#sorted = sort { (keys %$a)[0] cmp (keys %$b)[0] } #arr;
print "\nSorted\n";
for (#sorted)
{
printf "%s\n",((keys %$_)[0]);
}

Read hashes in Perl based on keys

I have an hash in Perl as below. There are:
%typeMethodsMap = (
CHECK_REP_EXISTS => "1_abc",
CHECK_JDK_VERSION => "2_abc",
CHECK_BLOCKS_FAILED => "1_xyz",
CHECK_OR_EXISTS => "2_xyz",
CHECK_UPG_EXISTS => "3_xyz",
CHECK_SSO_EXISTS => "4_xyz"
);
When the hash is read, the keys are not read as defined but are read randomly. I needs to read and run through the loop on the hash based on the ascending format of the keys i.e. CHECK_BLOCKS_FAILED, followed by CHECK_OR_EXISTS followed by CHECK_UPG_EXISTS and CHECK_SSO_EXISTSfor value "1_xyz", "2_xyz", "3_xyz" and "4_xyz" respectively.
Please let me know if any body can help me here?
Yes. By design, hash keys are random order.
There's a bunch of reasons for that - covered in perlsec and keys - but the long and short of it is if you need to preserve key ordering, you need to use sort.
Or a slice:
my #order = qw ( first second third );
my %hash = ( second => 'a', third => 'b', first => 'c' );
print "#hash{#order}";
Or:
foreach my $key ( #order ) {
print "$key = $hash{$key}\n";
}
Arrays are explicitly ordered numerically. Hashes are explicitly unordered (or random order).
If you're custom sorting, then you can use any function you like that returns -1, 0 or 1 based on the value of the comparison.
cmp does this for strings, and <=> does this for numbers.
Notes for custom sorting, it might look like this:
use strict;
use warnings;
use Data::Dumper;
my %typeMethodsMap = (
CHECK_REP_EXISTS => "1_abc",
CHECK_JDK_VERSION => "2_abc",
CHECK_BLOCKS_FAILED => "1_xyz",
CHECK_OR_EXISTS => "2_xyz",
CHECK_UPG_EXISTS => "3_xyz",
CHECK_SSO_EXISTS => "4_xyz",
);
my #order = qw(
CHECK_REP_EXISTS
CHECK_JDK_VERSION
CHECK_BLOCKS_FAILED
CHECK_OR_EXISTS
CHECK_UPG_EXISTS
CHECK_SSO_EXISTS
);
my $count = 0;
my %magic_order = map { $_ => $count++ } #order;
print Dumper \%magic_order;
sub custom_sort {
return $magic_order{$a} <=> $magic_order{$b};
}
foreach my $key ( sort { custom_sort } keys %typeMethodsMap ) {
print $key,"\n";
}
Although note - this isn't much more efficient, it's merely intended to illustrate 'custom sorting'. Alternatively - if you're wanting to sort based on your 'keys' being sorted:
sub custom_sort {
my ( $a_number, $a_text ) = split ('_',$a);
my ( $b_number, $b_text ) = split ( '_', $b );
if ( $a_number == $b_number ) {
return $a_text cmp $b_text;
}
else {
return $a_number <=> $b_number
}
}
This will sort numerically first, and then alphabetically second. (Swap the <=> and cmp if you want the opposite).
If you know what the keys are then you can write just
for my $key (qw/ 1_CHECK_BLOCKS_FAILED 2_CHECK_OR_EXISTS 3_CHECK_UPG_EXISTS /) {
...
}
Otherwise you must either keep track of the order of the keys in a separate array when you are building the hash, or use something like the Tie::Hash::Indexed module (there are several similar ones) which maintains the order of hash data

How to flatten a single nested hash key?

I have a data structure flattened by Hash::Flatten
For example,
flatten( { a => [ 'x', { b => 'y' } ] } )
produces
my $flat = {
'a:0' => 'x'
'a:1.b' => 'y',
};
I want to generate a flattened hash key from the a list of keys and indexes from a key Data::Diver's functions would accept.
For example,
my #key = ('a', 1, 'b');
should return
my $key = "a:1.b";
I have looked at Hash::Flatten, but it seems it can only flatten the whole hash, which is not what I am looking for. I just want to flatten a single (nested) key at a time.
To avoid replicating the escaping mechanism of Hash::Flatten, I tried the following:
use Data::Diver qw( DiveVal );
use Hash::Flatten qw( flatten );
my #key = ('a', 1, 'b');
DiveVal(my $h = {}, #key) = 1;
my ($key) = keys(%{ flatten($h) );
But that can just as easily return a:0 as a:1.b. Does anyone have any recommendations?
Only the key in which you are interested will have a defined value, so only a small change is needed.
use Data::Diver qw( DiveVal );
use Hash::Flatten qw( flatten );
sub flat_key {
DiveVal(my $h = {}, #_) = 1;
my $flat = flatten($h);
return ( grep $flat->{$_}, keys(%$flat) )[0];
}
my #key = ('a', 1, 'b');
my $key = flat_key(#key); # a:1.b
Because this uses Data::Diver, you can also use references to indicate that a number is really a hash key.
my #key = ('a', 1, 'b');
my $key = flat_key(map \$_, #key); # a.1.b
Alternatively, the escaping mechanism is well documented.
sub _flat_key_escape {
my ($s) = #_;
$s =~ s/([\\.:])/\\$1/g;
return $s;
}
sub flat_key {
my $key;
die("usage") if !#_;
for my $subkey (#_) {
if (ref($subkey)) { $key .= '.' . _flat_key_escape($$subkey); }
elsif ($subkey !~ /^-?[0-9]+\z/) { $key .= '.' . _flat_key_escape($subkey); }
else { $key .= ':' . _flat_key_escape($subkey); }
}
return substr($key, 1);
}
This is simple to do without reference to either Hash::Flatten or Data::Diver. The latter's DiveVal distinguishes between hash keys and array indices using the regex /^-?\d+$/, so we can do the same to discover whether a item in a sequence's Hash::Flatten default contraction should be preceded by a colon : (array index) or a dot . (hash key).
That gives the subroutine flatten_key below
use strict;
use warnings;
use 5.010;
my #key = ('a', 1, 'b');
my $key = flatten_key(#key);
say $key;
say flatten_key(qw/ a b c 1 2 3 /);
sub flatten_key {
join '', shift, map /^-?\d+$/ ? ":$_" : ".$_", #_;
}
output
a:1.b
a.b.c:1:2:3
Update
If you need to use the Data::Diver convention that any value passed as a scalar reference is a hash key, even if it looks like a number, then you can expand that subroutine like this. It's slightly more awkward because the first item in the sequence needs to be processed as well, but for some reason it doesn't take a delimiter character. So I've chosen to add a delimiter to all the items and then remove it from the first.
say flatten_key('a', 'b', \1, \2, 'c', 'd', 1, 2);
sub flatten_key {
my #key = map {
ref() ? ".$$_" :
/^-?\d+$/ ? ":$_" :
".$_"
} #_;
$key[0] =~ s/^[:.]//;
join '', #key;
}
output
a.b.1.2.c.d:1:2
Update
Also accounting for hash keys that themselves contain dots or colons:
say flatten_key(qw/ a .. :: b /);
sub flatten_key {
my #key = map {
(my $s = ref() ? $$_ : $_) =~ s/(?=[:.\\])/\\/g;
/^-?\d+$/ ? ":$s" : ".$s"
} #_;
$key[0] =~ s/^[:.]//;
join '', #key;
}
output
a.\.\..\:\:.b

Perl extract range of elements from a hash

If I have a hash:
%hash = ("Dog",1,"Cat",2,"Mouse",3,"Fly",4);
How can I extract the first X elements of this hash. For example if I want the first 3 elements, %newhash would contain ("Dog",1,"Cat",2,"Mouse",3).
I'm working with large hashes (~ 8000 keys).
"first X elements of this hash" doesn't mean anything. First three elements in order by numeric value?
my %hash = ( 'Dog' => 1, 'Cat' => 2, 'Mouse' => 3, 'Fly' => 4 );
my #hashkeys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
splice(#hashkeys, 3);
my %newhash;
#newhash{#hashkeys} = #hash{#hashkeys};
You might want to use something like this:
my %hash = ("Dog",1,"Cat",2,"Mouse",3,"Fly",4);
for ( (sort keys %hash)[0..2] ) {
say $hash{$_};
}
You should have an array 1st:
my %hash = ("Dog" => 1,"Cat"=>2,"Mouse"=>3,"Fly"=>4);
my #array;
foreach $value (sort {$hash{$a} <=> $hash{$b} }
keys %hash)
{
push(#array,{$value=>$hash{$value}});
}
#get range:
my #part=#array[0..2];
print part of result;
print $part[0]{'Cat'}."\n";

What's the best practise for Perl hashes with array values?

What is the best practise to solve this?
if (... )
{
push (#{$hash{'key'}}, #array ) ;
}
else
{
$hash{'key'} ="";
}
Is that bad practise for storing one element is array or one is just double quote in hash?
I'm not sure I understand your question, but I'll answer it literally as asked for now...
my #array = (1, 2, 3, 4);
my $arrayRef = \#array; # alternatively: my $arrayRef = [1, 2, 3, 4];
my %hash;
$hash{'key'} = $arrayRef; # or again: $hash{'key'} = [1, 2, 3, 4]; or $hash{'key'} = \#array;
The crux of the problem is that arrays or hashes take scalar values... so you need to take a reference to your array or hash and use that as the value.
See perlref and perlreftut for more information.
EDIT: Yes, you can add empty strings as values for some keys and references (to arrays or hashes, or even scalars, typeglobs/filehandles, or other scalars. Either way) for other keys. They're all still scalars.
You'll want to look at the ref function for figuring out how to disambiguate between the reference types and normal scalars.
It's probably simpler to use explicit array references:
my $arr_ref = \#array;
$hash{'key'} = $arr_ref;
Actually, doing the above and using push result in the same data structure:
my #array = qw/ one two three four five /;
my $arr_ref = \#array;
my %hash;
my %hash2;
$hash{'key'} = $arr_ref;
print Dumper \%hash;
push #{$hash2{'key'}}, #array;
print Dumper \%hash2;
This gives:
$VAR1 = {
'key' => [
'one',
'two',
'three',
'four',
'five'
]
};
$VAR1 = {
'key' => [
'one',
'two',
'three',
'four',
'five'
]
};
Using explicit array references uses fewer characters and is easier to read than the push #{$hash{'key'}}, #array construct, IMO.
Edit: For your else{} block, it's probably less than ideal to assign an empty string. It would be a lot easier to just skip the if-else construct and, later on when you're accessing values in the hash, to do a if( defined( $hash{'key'} ) ) check. That's a lot closer to standard Perl idiom, and you don't waste memory storing empty strings in your hash.
Instead, you'll have to use ref() to find out what kind of data you have in your value, and that is less clear than just doing a defined-ness check.
I'm not sure what your goal is, but there are several things to consider.
First, if you are going to store an array, do you want to store a reference to the original value or a copy of the original values? In either case, I prefer to avoid the dereferencing syntax and take references when I can:
$hash{key} = \#array; # just a reference
use Clone; # or a similar module
$hash{key} = clone( \#array );
Next, do you want to add to the values that exist already, even if it's a single value? If you are going to have array values, I'd make all the values arrays even if you have a single element. Then you don't have to decide what to do and you remove a special case:
$hash{key} = [] unless defined $hash{key};
push #{ $hash{key} }, #values;
That might be your "best practice" answer, which is often the technique that removes as many special cases and extra logic as possible. When I do this sort of thing in a module, I typically have a add_value method that encapsulates this magic where I don't have to see it or type it more than once.
If you already have a non-reference value in the hash key, that's easy to fix too:
if( defined $hash{key} and ! ref $hash{key} ) {
$hash{key} = [ $hash{key} ];
}
If you already have non-array reference values that you want to be in the array, you do something similar. Maybe you want an anonymous hash to be one of the array elements:
if( defined $hash{key} and ref $hash{key} eq ref {} ) {
$hash{key} = [ $hash{key} ];
}
Dealing with the revised notation:
if (... )
{
push (#{$hash{'key'}}, #array);
}
else
{
$hash{'key'} = "";
}
we can immediately tell that you are not following the standard advice that protects novices (and experts!) from their own mistakes. You're using a symbolic reference, which is not a good idea.
use strict;
use warnings;
my %hash = ( key => "value" );
my #array = ( 1, "abc", 2 );
my #value = ( 22, 23, 24 );
push(#{$hash{'key'}}, #array);
foreach my $key (sort keys %hash) { print "$key = $hash{$key}\n"; }
foreach my $value (#array) { print "array $value\n"; }
foreach my $value (#value) { print "value $value\n"; }
This does not run:
Can't use string ("value") as an ARRAY ref while "strict refs" in use at xx.pl line 8.
I'm not sure I can work out what you were trying to achieve. Even if you remove the 'use strict;' warning, the code shown does not detect a change from the push operation.
use warnings;
my %hash = ( key => "value" );
my #array = ( 1, "abc", 2 );
my #value = ( 22, 23, 24 );
push #{$hash{'key'}}, #array;
foreach my $key (sort keys %hash) { print "$key = $hash{$key}\n"; }
foreach my $value (#array) { print "array $value\n"; }
foreach my $value (#value) { print "value $value\n"; }
foreach my $value (#{$hash{'key'}}) { print "h_key $value\n"; }
push #value, #array;
foreach my $key (sort keys %hash) { print "$key = $hash{$key}\n"; }
foreach my $value (#array) { print "array $value\n"; }
foreach my $value (#value) { print "value $value\n"; }
Output:
key = value
array 1
array abc
array 2
value 22
value 23
value 24
h_key 1
h_key abc
h_key 2
key = value
array 1
array abc
array 2
value 22
value 23
value 24
value 1
value abc
value 2
I'm not sure what is going on there.
If your problem is how do you replace a empty string value you had stored before with an array onto which you can push your values, this might be the best way to do it:
if ( ... ) {
my $r = \$hash{ $key }; # $hash{ $key } autoviv-ed
$$r = [] unless ref $$r;
push #$$r, #values;
}
else {
$hash{ $key } = "";
}
I avoid multiple hash look-ups by saving a copy of the auto-vivified slot.
Note the code relies on a scalar or an array being the entire universe of things stored in %hash.