Pairs as hash keys - perl

Does anyone know how to make a hash with pairs of strings serving as keys in perl?
Something like...
{
($key1, $key2) => $value1;
($key1, $key3) => $value2;
($key2, $key3) => $value3;
etc....

You can't have a pair of scalars as a hash key, but you can make a multilevel hash:
my %hash;
$hash{$key1}{$key2} = $value1;
$hash{$key1}{$key3} = $value2;
$hash{$key2}{$key3} = $value3;
If you want to define it all at once:
my %hash = ( $key1 => { $key2 => $value1, $key3 => $value2 },
$key2 => { $key3 => $value3 } );
Alternatively, if it works for your situation, you could just concatenate your keys together
$hash{$key1 . $key2} = $value1; # etc
Or add a delimiter to separate the keys:
$hash{"$key1:$key2"} = $value1; # etc

You could use an invisible separator to join the coordinates:
Primarily for mathematics, the Invisible Separator (U+2063) provides a separator between characters where punctuation or space may be omitted such as in a two-dimensional index like i⁣j.
#!/usr/bin/env perl
use utf8;
use v5.12;
use strict;
use warnings;
use warnings qw(FATAL utf8);
use open qw(:std :utf8);
use charnames qw(:full :short);
use YAML;
my %sparse_matrix = (
mk_key(34,56) => -1,
mk_key(1200,11) => 1,
);
print Dump \%sparse_matrix;
sub mk_key { join("\N{INVISIBLE SEPARATOR}", #_) }
sub mk_vec { map [split "\N{INVISIBLE SEPARATOR}"], #_ }
~/tmp> perl mm.pl |xxd
0000000: 2d2d 2d0a 3132 3030 e281 a331 313a 2031 ---.1200...11: 1
0000010: 0a33 34e2 81a3 3536 3a20 2d31 0a .34...56: -1.

Usage: Multiple keys of a single value in a hash can be used for implementing a 2D matrix or N-dimensional matrix!
#!/usr/bin/perl -w
use warnings;
use strict;
use Data::Dumper;
my %hash = ();
my ($a, $b, $c) = (2,3,4);
$hash{"$a, $b ,$c"} = 1;
$hash{"$b, $c ,$a"} = 1;
foreach(keys(%hash) )
{
my #a = split(/,/, $_);
print Dumper(#a);
}

I do this:
{ "$key1\x1F$key2" => $value, ... }
Usually with a helper method:
sub getKey() {
return join( "\x1F", #_ );
}
{ getKey( $key1, $key2 ) => $value, ... }
----- EDIT -----
Updated the code above to use the ASCII Unit Separator per the recommendation from #chepner above

Use $; implicitly (or explicitly) in your hash keys, used for multidimensional emulation, like so:
my %hash;
$hash{$key1, $key2} = $value; # or %hash = ( $key1.$;.$key2 => $value );
print $hash{$key1, $key2} # returns $value
You can even set $; to \x1F if needed (the default is \034, from SUBSEP in awk):
local $; = "\x1F";

Related

Perl, Split string into Key:Value pairs for hash with lowercase keys without temporary array

Given a string of Key:Value pairs, I want to create a lookup hash but with lowercase values for the keys. I can do so with this code
my $a="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my #a = split '\|', $a;
my %b = map { $a[$_] = ( !($_ % 2) ? lc($a[$_]) : $a[$_]) } 0 .. $#a ;
The resulting Hash would look like this Dumper output:
$VAR1 = {
'key3' => 'Value3',
'key2' => 'Value2',
'key1' => 'Value1'
};
Would it be possible to directly create hash %b without using temporary array #a or is there a more efficient way to achieve the same result?
Edit: I forgot to mention that I cannot use external modules for this. It needs to be basic Perl.
You can use pairmap from List::Util to do this without an intermediate array at all.
use strict;
use warnings;
use List::Util 1.29 'pairmap';
my $str="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %hash = pairmap { lc($a) => $b } split /\|/, $str;
Note: you should never use $a or $b outside of sort (or List::Util pair function) blocks. They are special global variables for sort, and just declaring my $a in a scope can break all sorts (and List::Util pair functions) in that scope. An easy solution is to immediately replace them with $x and $y whenever you find yourself starting to use them as example variables.
Since the key-value pair has to be around the | you can use a regex
my $v = "KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %h = split /\|/, $v =~ s/([^|]+) \| ([^|]+)/lc($1).q(|).$2/xger;
use strict;
use warnings;
use Data::Dumper;
my $i;
my %hash = map { $i++ % 2 ? $_ : lc } split(/\|/, 'KEY1|Value1|kEy2|Value2|KeY3|Value3');
print Dumper(\%hash);
Output:
$VAR1 = {
'key1' => 'Value1',
'key2' => 'Value2',
'key3' => 'Value3'
};
For fun, here are two additional approaches.
A cheaper one than the original (since the elements are aliased rather than copied into #_):
my %hash = sub { map { $_ % 2 ? $_[$_] : lc($_[$_]) } 0..$#_ }->( ... );
A more expensive one than the original:
my %hash = ...;
#hash{ map lc, keys(%hash) } = delete( #hash{ keys(%hash) } );
More possible solutions using regexes to do all the work, but not very pretty unless you really like regex:
use strict;
use warnings;
my $str="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %hash;
my $copy = $str;
$hash{lc $1} = $2 while $copy =~ s/^([^|]*)\|([^|]*)\|?//;
use strict;
use warnings;
my $str="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %hash;
$hash{lc $1} = $2 while $str =~ m/\G([^|]*)\|([^|]*)\|?/g;
use strict;
use warnings;
my $str="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %hash = map { my ($k, $v) = split /\|/, $_, 2; (lc($k) => $v) }
$str =~ m/([^|]*\|[^|]*)\|?/g;
Here's a solution that avoids mutating the input string, constructing a new string of the same length as the input string, or creating an intermediate array in memory.
The solution here changes the split into looping over a match statement.
#! /usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $a="KEY1|Value1|kEy2|Value2|KeY3|Value3";
sub normalize_alist_opt {
my ($input) = #_;
my %c;
my $last_key;
while ($input =~ m/([^|]*(\||\z)?)/g) {
my $s = $1;
next unless $s ne '';
$s =~ s/\|\z//g;
if (defined $last_key) {
$c{ lc($last_key) } = $s;
$last_key = undef;
} else {
$last_key = $s;
}
}
return \%c;
}
print Dumper(normalize_alist_opt($a));
A potential solution that operates over the split directly. Perl might recognize and optimize the special case. Although based on discussions here and here, I'm not sure.
sub normalize_alist {
my ($input) = #_;
my %c;
my $last_key;
foreach my $s (split /\|/, $input) {
if (defined $last_key) {
$c{ lc($last_key) } = $s;
$last_key = undef;
} else {
$last_key = $s;
}
}
return \%c;
}

How to split a string into multiple hash keys in perl

I have a series of strings for example
my #strings;
$strings[1] = 'foo/bar/some/more';
$strings[2] = 'also/some/stuff';
$strings[3] = 'this/can/have/way/too/many/substrings';
What I would like to do is to split these strings and store them in a hash as keys like this
my %hash;
$hash{foo}{bar}{some}{more} = 1;
$hash{also}{some}{stuff} = 1;
$hash{this}{can}{have}{way}{too}{many}{substrings} = 1;
I could go on and list my failed attempts, but I don't think they add to the value to the question, but I will mention one. Lets say I converted 'foo/bar/some/more' to '{foo}{bar}{some}{more}'. Could I somehow store that in a variable and do something like the following?
my $var = '{foo}{bar}{some}{more}';
$hash$var = 1;
NOTE: THIS DOESN'T WORK, but I hope it only doesn't due to a syntax error.
All help appreciated.
Identical logic to Shawn's answer. But I've hidden the clever hash-walking bit in a subroutine. And I've set the final value to 1 rather than an empty hash reference.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my #keys = qw(
foo/bar/some/more
also/some/stuff
this/can/have/way/too/many/substrings
);
my %hash;
for (#keys) {
multilevel(\%hash, $_);
}
say Dumper \%hash;
sub multilevel {
my ($hashref, $string) = #_;
my $curr_ref = $hashref;
my #strings = split m[/], $string;
for (#strings[0 .. $#strings - 1]) {
$curr_ref->{$_} //= {};
$curr_ref = $curr_ref->{$_};
}
$curr_ref->{#strings[-1]} = 1;
}
You have to use hash references to walk down thru the list of keys.
use Data::Dumper;
my %hash = ();
while( my $string = <DATA> ){
chomp $string;
my #keys = split /\//, $string;
my $hash_ref = \%hash;
for my $key ( #keys ){
$hash_ref->{$key} = {};
$hash_ref = $hash_ref->{$key};
}
}
say Dumper \%hash;
__DATA__
foo/bar/some/more
also/some/stuff
this/can/have/way/too/many/substrings
Just use a library.
use Data::Diver qw(DiveVal);
my #strings = (
undef,
'foo/bar/some/more',
'also/some/stuff',
'this/can/have/way/too/many/substrings',
);
my %hash;
for my $index (1..3) {
my $root = {};
DiveVal($root, split '/', $strings[$index]) = 1;
%hash = (%hash, %$root);
}
__END__
(
also => {some => {stuff => 1}},
foo => {bar => {some => {more => 1}}},
this => {can => {have => {way => {too => {many => {substrings => 1}}}}}},
)
I took the easy way out w/'eval':
use Data::Dumper;
%hash = ();
#strings = ( 'this/is/a/path', 'and/another/path', 'and/one/final/path' );
foreach ( #strings ) {
s/\//\}\{/g;
$str = '{' . $_ . '}'; # version 2: remove this line, and then
eval( "\$hash$str = 1;" ); # eval( "\$hash{$_} = 1;" );
}
print Dumper( %hash )."\n";

How to get array values in hash using map function in Perl

I have an array of elements combined with # which I wish to put in hash , first element of that array as key and rest as value after splitting of that array elements by #
But it is not happening.
Ex:
my #arr = qw(9093#AT#BP 8111#BR 7456#VD#AP 7786#WS#ER 9431#BP ) #thousand of data
What I want is
$hash{9093} = [AT,AP];
$hash{8111} = [BR]; and so on
How we can accomplish it using map function. Otherwise I need to use for loop but I wish to use map function.
my %hash = map { my ($k, #v) = split /#/; $k => \#v } #arr;
For comparison, the corresponding foreach loop follows:
my %hash;
for (#arr) {
my ($k, #v) = split /#/;
$hash{$k} = \#v;
}
Use split to split on '#', taking the first chunk as the key, and keeping the rest in an array. Then create a hash using the keys and references to the arrays.
use Data::Dumper;
my #arr = qw( 9093#AT#BP 8111#BR 7456#VD#AP 7786#WS#ER 9431#BP );
my %hash = map {
my ($key, #vals) = split '#', $_;
$key => \#vals;
} #arr;
print Dumper \%hash;
No effort shown in your question, but I am on a code freeze so I'll bite :)
A think that a for loop would be more idiomatic Perl here, process the elements one-by-one, split on # and then assign into your hash:
use strict;
use warnings;
use Data::Dumper;
my #arr = qw(9093#AT#BP 8111#BR 7456#VD#AP 7786#WS#ER 9431#BP );
my %h;
for my $elem ( #arr ) {
my ($key, #vals) = split /#/, $elem;
$h{$key} = \#vals;
}
print Dumper \%h;
That is easy:
%s = (map {split(/#/, $_, 2)} #arr);
Testing it:
$ cat 1.pl
my #arr = qw(9093#AT#BP 8111#BR 7456#VD#AP 7786#WS#ER 9431#BP );
%s = (map {split(/#/, $_, 2)} #arr);
foreach my $key ( keys %s )
{
print "key: $key, value: $s{$key}\n";
}
$ perl 1.pl
key: 7456, value: VD#AP
key: 8111, value: BR
key: 7786, value: WS#ER
key: 9431, value: BP
key: 9093, value: AT#BP
use strict;
use warnings;
use Data::Dumper;
my #arr = ('9093#AT#BP', '8111#BR', '7456#VD#AP', '7786#WS#ER', '9431#BP' );
my %h = map { map { splice(#$_, 0, 1), $_ } [ split /#/ ] } #arr;
print Dumper \%h;

how to remove duplicate values and create a new perl hash?

my %hash1 = (
a=>192.168.0.1,
b=>192.168.0.1,
c=>192.168.2.2,
d=>192.168.2.3,
e=>192.168.3.4,
f=>192.168.3.4
);
i have a perl hash like given above. keys are device names and values are ip addresses.How do i create a hash with no duplicate ip addresses (like %hash2) using %hash1? (devices that have same ips are removed)
my %hash2 = ( c=>192.168.2.2, d=>192.168.2.3 );
First of all you need to quote your IP addresses, because 192.168.0.1 is V-String in perl, means chr(192).chr(168).chr(0).chr(1).
And my variant is:
my %t;
$t{$_}++ for values %hash1; #count values
my #keys = grep
{ $t{ $hash1{ $_ } } == 1 }
keys %hash1; #find keys for slice
my %hash2;
#hash2{ #keys } = #hash1{ #keys }; #hash slice
How about:
my %hash1 = (
a=>'192.168.0.1',
b=>'192.168.0.1',
c=>'192.168.2.2',
d=>'192.168.2.3',
e=>'192.168.3.4',
f=>'192.168.3.4',
);
my (%seen, %out);
while( my ($k,$v) = each %hash1) {
if ($seen{$v}) {
delete $out{$seen{$v}};
} else {
$seen{$v} = $k;
$out{$k} = $v;
}
}
say Dumper\%out;
output:
$VAR1 = {
'c' => '192.168.2.2',
'd' => '192.168.2.3'
};
A solution using the CPAN module List::Pairwise:
use strict;
use warnings;
use List::Pairwise qw( grep_pairwise );
use Data::Dumper;
my %hash1 = (
a => '192.168.0.1',
b => '192.168.0.1',
c => '192.168.2.2',
d => '192.168.2.3',
e => '192.168.3.4',
f => '192.168.3.4'
);
my %count;
for my $ip ( values %hash1 ) { $count{ $ip }++ }
my %hash2 = grep_pairwise { $count{ $b } == 1 ? ( $a => $b ) : () } %hash1;
print Dumper \%hash2;
It's pretty straightforward. First you count the IPs in an auxiliary hash. And then you select only those IPs with a count of one using grep_pairwise from List::Pairwise. The syntax of grep_pairwise is like grep:
my #result = grep_pairwise { ... } #list;
The idea of grep_pairwise is to select the elements of #list two by two, with $a representing the first element of the pair, and $b the second (in this case the IP). (Remember that a hash evaluates to a list of ($key1, $value1, $key2, $value2, ...) pairs in list context).

How to convert an array into a hash, with variable names mapped as keys in Perl?

I find myself doing this pattern a lot in perl
sub fun {
my $line = $_[0];
my ( $this, $that, $the_other_thing ) = split /\t/, $line;
return { 'this' => $this, 'that' => $that, 'the_other_thing' => $the_other_thing};
}
Obviously I can simplify this pattern by returning the output of a function which transforms a given array of variables into a map, where the keys are the same names as the variables eg
sub fun {
my $line = $_[0];
my ( $this, $that, $the_other_thing ) = split /\t/, $line;
return &to_hash( $this, $that, $the_other_thing );
}
It helps as the quantity of elements get larger. How do I do this? It looks like I could combine PadWalker & closures, but I would like a way to do this using only the core language.
EDIT: thb provided a clever solution to this problem, but I've not checked it because it bypasses a lot of the hard parts(tm). How would you do it if you wanted to rely on the core language's destructuring semantics and drive your reflection off the actual variables?
EDIT2: Here's the solution I hinted at using PadWalker & closures:
use PadWalker qw( var_name );
# Given two arrays, we build a hash by treating the first set as keys and
# the second as values
sub to_hash {
my $keys = $_[0];
my $vals = $_[1];
my %hash;
#hash{#$keys} = #$vals;
return \%hash;
}
# Given a list of variables, and a callback function, retrieves the
# symbols for the variables in the list. It calls the function with
# the generated syms, followed by the original variables, and returns
# that output.
# Input is: Function, var1, var2, var3, etc....
sub with_syms {
my $fun = shift #_;
my #syms = map substr( var_name(1, \$_), 1 ), #_;
$fun->(\#syms, \#_);
}
sub fun {
my $line = $_[0];
my ( $this, $that, $other) = split /\t/, $line;
return &with_syms(\&to_hash, $this, $that, $other);
}
You could use PadWalker to try to get the name of the variables, but that's really not something you should do. It's fragile and/or limiting.
Instead, you could use a hash slice:
sub fun {
my ($line) = #_;
my %hash;
#hash{qw( this that the_other_thing )} = split /\t/, $line;
return \%hash;
}
You can hide the slice in a function to_hash if that's what you desire.
sub to_hash {
my $var_names = shift;
return { map { $_ => shift } #$var_names };
}
sub fun_long {
my ($line) = #_;
my #fields = split /\t/, $line;
return to_hash [qw( this that the_other_thing )] #fields;
}
sub fun_short {
my ($line) = #_;
return to_hash [qw( this that the_other_thing )], split /\t/, $line;
}
But if you insist, here's the PadWalker version:
use Carp qw( croak );
use PadWalker qw( var_name );
sub to_hash {
my %hash;
for (0..$#_) {
my $var_name = var_name(1, \$_[$_])
or croak("Can't determine name of \$_[$_]");
$hash{ substr($var_name, 1) } = $_[$_];
}
return \%hash;
}
sub fun {
my ($line) = #_;
my ($this, $that, $the_other_thing) = split /\t/, $line;
return to_hash($this, $that, $the_other_thing);
}
This does it:
my #part_label = qw( part1 part2 part3 );
sub fun {
my $line = $_[0];
my #part = split /\t/, $line;
my $no_part = $#part_label <= $#part ? $#part_label : $#part;
return map { $part_label[$_] => $part[$_] } (0 .. $no_part);
}
Of course, your code must name the parts somewhere. The code above does it by qw(), but you can have your code autogenerate the names if you like.
[If you anticipate a very large list of *part_labels,* then you should probably avoid the *(0 .. $no_part)* idiom, but for lists of moderate size it works fine.]
Update in response to OP's comment below: You pose an interesting challenge. I like it. How close does the following get to what you want?
sub to_hash ($$) {
my #var_name = #{shift()};
my #value = #{shift()};
$#var_name == $#value or die "$0: wrong number of elements in to_hash()\n";
return map { $var_name[$_] => $value[$_] } (0 .. $#var_name);
}
sub fun {
my $line = $_[0];
return to_hash [qw( this that the_other_thing )], [split /\t/, $line];
}
If I understand you properly you want to build a hash by assigning a given sequence of keys to values split from a data record.
This code seems to do the trick. Please explain if I have misunderstood you.
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Terse++;
my $line = "1111 2222 3333 4444 5555 6666 7777 8888 9999\n";
print Dumper to_hash($line, qw/ class division grade group kind level rank section tier /);
sub to_hash {
my #fields = split ' ', shift;
my %fields = map {$_ => shift #fields} #_;
return \%fields;
}
output
{
'division' => '2222',
'grade' => '3333',
'section' => '8888',
'tier' => '9999',
'group' => '4444',
'kind' => '5555',
'level' => '6666',
'class' => '1111',
'rank' => '7777'
}
For a more general solution which will build a hash from any two lists, I suggest the zip_by function from List::UtilsBy
use strict;
use warnings;
use List::UtilsBy qw/zip_by/;
use Data::Dumper;
$Data::Dumper::Terse++;
my $line = "1111 2222 3333 4444 5555 6666 7777 8888 9999\n";
my %fields = zip_by { $_[0] => $_[1] }
[qw/ class division grade group kind level rank section tier /],
[split ' ', $line];
print Dumper \%fields;
The output is identical to that of my initial solution.
See also the pairwise function from List::MoreUtils which takes a pair of arrays instead of a list of array references.
Aside from parsing the Perl code yourself, a to_hash function isn't feasible using just the core language. The function being called doesn't know whether those args are variables, return values from other functions, string literals, or what have you...much less what their names are. And it doesn't, and shouldn't, care.