How to shuffle the values in a hash? - perl

I have a hash of string IDs. What is the best way to shuffle the IDs?
As an example, my hash assigns the following IDs:
this => 0
is => 1
a => 2
test => 3
Now I'd like to randomly shuffle that. An example outcome would be:
this => 1
is => 0
a => 3
test => 2

You could use the shuffle method in List::Util to help out:
use List::Util qw(shuffle);
...
my #values = shuffle(values %hash);
map { $hash{$_} = shift(#values) } (keys %hash);

A hash slice would be the clearest way to me:
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/shuffle/;
use Data::Dumper;
my %h = (
this => 0,
is => 1,
a => 2,
test => 3,
);
#h{keys %h} = shuffle values %h;
print Dumper \%h;
This has a drawback in that huge hashes would take up a lot of memory as you pull all of their keys and values out. A more efficient (from a memory standpoint) solution would be:
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/shuffle/;
use Data::Dumper;
my %h = (
this => 0,
is => 1,
a => 2,
test => 3,
);
{ #bareblock to cause #keys to be garbage collected
my #keys = shuffle keys %h;
while (my $k1 = each %h) {
my $k2 = shift #keys;
#h{$k1, $k2} = #h{$k2, $k1};
}
}
print Dumper \%h;
This code has the benefit of only having to duplicate the keys (rather than the keys and values).
The following code doesn't randomize the values (except on Perl 5.8.1 where the order of keys is guaranteed to be random), but it does mix up the order. It does have the benefit of working in place without too much extra memory usage:
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/shuffle/;
use Data::Dumper;
my %h = (
this => 0,
is => 1,
a => 2,
test => 3,
);
my $k1 = each %h;
while (defined(my $k2 = each %h)) {
#h{$k1, $k2} = #h{$k2, $k1};
last unless defined($k1 = each %h);
}
print Dumper \%h;

Related

How to remove duplicate reference in array?

Consider this array:
my #array = (hashref1, hashref2, hashref3, hashref1);
How can I remove the duplicate reference hashref1.
If by "duplicate reference" you mean they both refer to the same anonymous hash, you can use List::Util's uniq:
#!/usr/bin/perl
use warnings;
use strict;
use List::Util qw{ uniq };
use Data::Dumper;
my $hashref1 = {a => 10};
my $hashref2 = {b => 11};
my $hashref3 = {c => 12};
my #array = ($hashref1, $hashref2, $hashref3, $hashref1);
print Dumper(uniq(#array));
Output:
$VAR1 = {
'a' => 10
};
$VAR2 = {
'b' => 11
};
$VAR3 = {
'c' => 12
};
The stringified versions of the references will be used for comparison, i.e. something like HASH(0x560ee5fdf220). The same references have the same address.
But, if you mean they refer to different objects with the same contents, you need to find a way how to stringify the hashrefs so that the contents is always the same.
#!/usr/bin/perl
use warnings;
use strict;
use List::Util qw{ uniq };
use Data::Dumper;
my $hashref1 = {a => 10, A => 2};
my $hashref2 = {b => 11, B => 3};
my $hashref3 = {c => 12, C => 4};
my $hashref4 = {a => 10, A => 2};
my #array = ($hashref1, $hashref2, $hashref3, $hashref4);
my #serialized = do {
# Comment out the following line to see how 1 and 4
# can be different sometimes.
local $Data::Dumper::Sortkeys = 1;
map Dumper($_), #array;
};
my $VAR1;
print Dumper(map eval, uniq(#serialized));

Recursive sorting in Perl

I have a hash that contains keys that correspond to database subscripts, but the database can have multidimensional records so the key could be a single subscript, or a list of subscripts.
I need to find a way to sort these records so I can print them in a logical order.
Example:
my $data = {
'1,1,1' => 'data1',
'1,2' => 'data2',
'1,1,3' => 'stuff',
'2,1,1' => 'data3',
'2,1,2' => 'data4',
'2,1,3' => 'data blah',
'2,2,2' => 'datawk2n',
'3,1,2' => 'more',
};
# Should print the keys in the properly sorted order
print join "\n", sort some_function keys %$data;
sub some_function {
# Do some sorting magikz
}
I want it to sort by the leftmost subscript first. If the leftmost value is identical I want to move to the next value and compare those. If those are identical I want to continue to the next one ... and so on ... until all possibilities are exhausted.
This will most likely involve some recursion, but I can't figure out how to make recursion work with those fancy $a and $b variables.
What can I put in some_function to get the following output?
1,1,1
1,1,3
1,2
2,1,1
2,1,2
2,1,3
2,2,2
3,1,2
The following is the fastest solution (by far!):
my #sorted_keys =
map { join ',', unpack 'N*', $_ }
sort
map { pack 'N*', split /,/, $_ }
keys(%$data);
If you want something simpler, and still quite fast, you could use a "natural sort".
Sort::Key::Natural
use Sort::Key::Natural qw( natsort );
my #sorted_keys = natsort(keys(%$data));
Sort::Naturally
use Sort::Naturally qw( nsort );
my #sorted_keys = nsort(keys(%$data));
Benchmarks:
Rate SN SKN grt
SN 3769/s -- -40% -88%
SKN 6300/s 67% -- -79%
grt 30362/s 705% 382% --
Benchmark code:
use strict;
use warnings;
use Benchmark qw( cmpthese );
use List::Util qw( shuffle );
use Sort::Key::Natural qw( );
use Sort::Naturally qw( );
my #keys =
shuffle
split ' ',
'1 1,0 1,1 1,1,1 1,1,3 1,2 2,1,1 2,1,2 2,1,3 2,2,2 3,1,2 10,1,1';
sub grt {
my #sorted_keys =
map { join ',', unpack 'N*', $_ }
sort
map { pack 'N*', split /,/, $_ }
#keys;
}
sub SKN { my #sorted_keys = Sort::Key::Natural::natsort(#keys); }
sub SN { my #sorted_keys = Sort::Naturally::nsort(#keys); }
cmpthese(-3, {
grt => \&grt,
SKN => \&SKN,
SN => \&SN,
});
I thought the Sort::Naturally module would help you here, but it seems not
I must have had a bug in my test. This works fine
use Sort::Naturally 'nsort';
say for nsort keys %$data;
I recommend either this or the Sort::Key::Naturally solution as they are the clearest
It is bad practice to chase speed of execution, especially at the expense of readability, before there is evidence that a given solution is too slow. Even then it is foolish to randomly optimise chunks of your code in the hope of making a difference, and your solution should be run through a profiler to discover where it would be most fruitful to make enhancements
There is no need for recursion. This program shows a sort subroutine by_elements which simply compares each item in the list until it finds either a mismatch or the end of one of the lists
In the former case the result is just the comparison of the two differ elements, and in the latter it is a comparison of the number of elements in the two lists
use strict;
use warnings 'all';
use feature 'say';
my $data = {
'1,1,1' => 'data1',
'1,2' => 'data2',
'1,1,3' => 'stuff',
'2,1,1' => 'data3',
'2,1,2' => 'data4',
'2,1,3' => 'data blah',
'2,2,2' => 'datawk2n',
'3,1,2' => 'more',
'10,1,1' => 'odd',
'1,1' => 'simple',
'1,0' => 'simple0',
'1' => 'simpler',
};
say for sort by_elements keys %$data;
sub by_elements {
my ( $aa, $bb ) = map [/\d+/g], $a, $b;
for ( my $i = 0; $i < #$aa and $i < #$bb; ++$i ) {
my $cmp = $aa->[$i] <=> $bb->[$i];
return $cmp if $cmp;
}
return #$aa <=> #$bb;
}
output
1
1,0
1,1
1,1,1
1,1,3
1,2
2,1,1
2,1,2
2,1,3
2,2,2
3,1,2
10,1,1
Use natsort of Sort::Key::Natural:
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
use Sort::Key::Natural qw(natsort);
my %data = (
'1,1,1' => 'data1',
'1,2' => 'data2',
'1,1,3' => 'stuff',
'2,1,1' => 'data3',
'2,1,2' => 'data4',
'2,1,3' => 'data blah',
'2,2,2' => 'datawk2n',
'10,1,2' => 'more',
);
say for natsort keys %data;
Outputs:
1,1,1
1,1,3
1,2
2,1,1
2,1,2
2,1,3
2,2,2
10,1,2
No need for recursion, just a loop that you can break out of.
sub some_function {
my #aa = split /,/, $a;
my #bb = split /,/, $b;
my $cmp = 0;
for (my $i=0; $i<#aa || $i<#bb; $i++) {
$cmp = $aa[$i] <=> $bb[$i];
last if $cmp;
}
$cmp;
}
But if your heart is set on a recursive solution, there's
sub aref_sort_recurse {
my ($c,$d) = #_;
#$c ? #$d ? shift #$c <=> shift #$d || aref_sort_recurse($c,$d) ? 1 : -#$d
}
sub some_function {
aref_sort_recurse( [split /,/, $a], [split /,/, $b] )
}

Pairs as hash keys

Does anyone know how to make a hash with pairs of strings serving as keys in perl?
Something like...
{
($key1, $key2) => $value1;
($key1, $key3) => $value2;
($key2, $key3) => $value3;
etc....
You can't have a pair of scalars as a hash key, but you can make a multilevel hash:
my %hash;
$hash{$key1}{$key2} = $value1;
$hash{$key1}{$key3} = $value2;
$hash{$key2}{$key3} = $value3;
If you want to define it all at once:
my %hash = ( $key1 => { $key2 => $value1, $key3 => $value2 },
$key2 => { $key3 => $value3 } );
Alternatively, if it works for your situation, you could just concatenate your keys together
$hash{$key1 . $key2} = $value1; # etc
Or add a delimiter to separate the keys:
$hash{"$key1:$key2"} = $value1; # etc
You could use an invisible separator to join the coordinates:
Primarily for mathematics, the Invisible Separator (U+2063) provides a separator between characters where punctuation or space may be omitted such as in a two-dimensional index like i⁣j.
#!/usr/bin/env perl
use utf8;
use v5.12;
use strict;
use warnings;
use warnings qw(FATAL utf8);
use open qw(:std :utf8);
use charnames qw(:full :short);
use YAML;
my %sparse_matrix = (
mk_key(34,56) => -1,
mk_key(1200,11) => 1,
);
print Dump \%sparse_matrix;
sub mk_key { join("\N{INVISIBLE SEPARATOR}", #_) }
sub mk_vec { map [split "\N{INVISIBLE SEPARATOR}"], #_ }
~/tmp> perl mm.pl |xxd
0000000: 2d2d 2d0a 3132 3030 e281 a331 313a 2031 ---.1200...11: 1
0000010: 0a33 34e2 81a3 3536 3a20 2d31 0a .34...56: -1.
Usage: Multiple keys of a single value in a hash can be used for implementing a 2D matrix or N-dimensional matrix!
#!/usr/bin/perl -w
use warnings;
use strict;
use Data::Dumper;
my %hash = ();
my ($a, $b, $c) = (2,3,4);
$hash{"$a, $b ,$c"} = 1;
$hash{"$b, $c ,$a"} = 1;
foreach(keys(%hash) )
{
my #a = split(/,/, $_);
print Dumper(#a);
}
I do this:
{ "$key1\x1F$key2" => $value, ... }
Usually with a helper method:
sub getKey() {
return join( "\x1F", #_ );
}
{ getKey( $key1, $key2 ) => $value, ... }
----- EDIT -----
Updated the code above to use the ASCII Unit Separator per the recommendation from #chepner above
Use $; implicitly (or explicitly) in your hash keys, used for multidimensional emulation, like so:
my %hash;
$hash{$key1, $key2} = $value; # or %hash = ( $key1.$;.$key2 => $value );
print $hash{$key1, $key2} # returns $value
You can even set $; to \x1F if needed (the default is \034, from SUBSEP in awk):
local $; = "\x1F";

Sorting a hash from the first key to the last (Perl)

I have the following hash, and I wish to keep it in the order I've set it in; is this even possible? If not, do any alternatives exist?
my %hash = ('Key1' => 'Value1', 'Key2' => 'Value2', 'Key3' => 'Value3');
Do I need to write a custom sorting subroutine? What are my options?
Thank you!
http://metacpan.org/pod/Tie::IxHash
use Tie::IxHash;
my %hash;
tie %hash,'Tie::IxHash';
This hash will maintain its order.
See Tie::Hash::Indexed. Quoting its Synopsis:
use Tie::Hash::Indexed;
tie my %hash, 'Tie::Hash::Indexed';
%hash = ( I => 1, n => 2, d => 3, e => 4 );
$hash{x} = 5;
print keys %hash, "\n"; # prints 'Index'
print values %hash, "\n"; # prints '12345'
Try doing this :
print "$_=$hash{$_}\n" for sort keys %hash;
if you want it sorted in alphabetic order.
If you need to retain original order, see other posts.
See http://perldoc.perl.org/functions/sort.html
One possibility is to do the same as you sometimes do with arrays: specify the keys.
for (0..$#a) { # Sorted array keys
say $a[$_];
}
for (sort keys %h) { # Sorted hash keys
say $h{$_};
}
for (0, 1, 3) { # Sorted array keys
say $h{$_};
}
for (qw( Key1 Key2 Key3 )) { # Sorted hash keys
say $h{$_};
}
You can also fetch the ordered values as follows:
my #values = #h{qw( Key1 Key2 Key3 )};
This depends on how you're going to access the data. If you just want to store them and access the last/first values, you can always put hashes in an array and use push() and pop().
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
use Data::Dumper;
my #hashes;
foreach( 1..5 ){
push #hashes, { "key $_" => "foo" };
}
say Dumper(\#hashes);

In Perl, how can I find the index of a given value in an array?

$VAR1 = [
'830974',
'722065',
'722046',
'716963'
];
How can I calculate the array index for the value "722065"?
The firstidx function from List::MoreUtils can help:
use strict;
use warnings;
use List::MoreUtils qw(firstidx);
my #nums = ( '830974', '722065', '722046', '716963' );
printf "item with index %i in list is 722065\n", firstidx { $_ eq '722065' } #nums;
__END__
item with index 1 in list is 722065
using List::Util, which is a core module, unlike List::MoreUtils, which is not:
use List::Util qw(first);
my #nums = ( '830974', '722065', '722046', '716963' );
my $index = first { $nums[$_] eq '722065' } 0..$#nums;
Here is how you would find all the positions at which a given value appears:
#!/usr/bin/perl
use strict;
use warnings;
my #x = ( 1, 2, 3, 3, 2, 1, 1, 2, 3, 3, 2, 1 );
my #i = grep { $x[$_] == 3 } 0 .. $#x;
print "#i\n";
If you only need the first index, you should use List::MoreUtils::first_index.
If you only need to look up the one item, use firstidx as others have said.
If you need to do many lookups, build an index.
If your array items are unique, building an index is quite simple. But it's not much more difficult to build one that handles duplicate items. Examples of both follow:
use strict;
use warnings;
use Data::Dumper;
# Index an array with unique elements.
my #var_uniq = qw( 830974 722065 722046 716963 );
my %index_uniq = map { $var_uniq[$_] => $_ } 0..$#var_uniq;
# You could use hash slice assinment instead of map:
# my %index_uniq;
# #index_uniq{ #var_uniq } = 0..$#var_uniq
my $uniq_index_of_722065 = $index_uniq{722065};
print "Uniq 72665 at: $uniq_index_of_722065\n";
print Dumper \%index_uniq;
# Index an array with repeated elements.
my #var_dupes = qw( 830974 722065 830974 830974 722046 716963 722065 );
my %index_dupes;
for( 0..$#var_dupes ) {
my $item = $var_dupes[$_];
# have item in index?
if( $index_dupes{$item} ) {
# Add to array of indexes
push #{$index_dupes{$item}}, $_;
}
else {
# Add array ref with index to hash.
$index_dupes{$item} = [$_];
}
}
# Dereference array ref for assignment:
my #dupe_indexes_of_722065 = #{ $index_dupes{722065} };
print "Dupes 722065 at: #dupe_indexes_of_722065\n";
print Dumper \%index_dupes;
Here's hastily written attempt at a reverse look-up using a hash.
my $VAR1 = [ '830974', '722065', '722046', '716963' ];
my %reverse;
$reverse{$VAR1->[$_]} = $_ for 0 .. #$VAR1 - 1;
print $reverse{722065};
This does not account for arrays with duplicate values. I do not endorse this solution for production code.
check out the Perl FAQ
use strict;
use Data::Dumper;
sub invert
{
my $i=0;
map { $i++ => $_ } #_;
}
my #a = ('a','b','c','d','e');
print Dumper #a;
print Dumper invert #a;