get nbest key-value pairs hash table in Perl - perl

I have this script that use a hash table:
#!/usr/bin/env perl
use strict; use warnings;
my $hash = {
'cat' => {
"félin" => '0.500000',
'chat' => '0.600000',
'chatterie' => '0.300000'
'chien' => '0.01000'
},
'rabbit' => {
'lapin' => '0.600000'
},
'canteen' => {
"ménagère" => '0.400000',
'cantine' => '0.600000'
}
};
my $text = "I love my cat and my rabbit canteen !\n";
foreach my $word (split "\s+", $text) {
print $word;
exists $hash->{$word}
and print "[" . join(";", keys %{ $hash->{$word} }) . "]";
print " ";
}
For now, I have this output:
I love my cat[chat;félin;chatterie;chien] and my rabbit[lapin] canteen[cantine;ménagère] !
I need to have the nbest key value according to the frequencies (stored in my hash). For example, I want to have the 3 best translations according to the frequencies like this:
I love my cat[chat;félin;chatterie] and my rabbit[lapin] canteen[cantine;ménagère] !
How can I change my code to take into account the frequencies of each values and also to print the nbest values ?
Thanks for your help.

The tidiest way to do this is to write a subroutine that returns the N most frequent translations for a given word. I have written best_n in the program below to do that. It uses rev_nsort_by from List::UtilsBy to do the sort succinctly. It isn't a core module, and so may well need to be installed.
I have also used an executable substitution to modify the string in-place.
use utf8;
use strict;
use warnings;
use List::UtilsBy qw/ rev_nsort_by /;
my $hash = {
'cat' => {
'félin' => '0.500000',
'chat' => '0.600000',
'chatterie' => '0.300000',
'chien' => '0.01000',
},
'rabbit' => {
'lapin' => '0.600000',
},
'canteen' => {
'ménagère' => '0.400000',
'cantine' => '0.600000',
}
};
my $text = "I love my cat and my rabbit canteen !\n";
$text =~ s{(\S+)}{
$hash->{$1} ? sprintf '[%s]', join(';', best_n($1, 3)) : $1;
}ge;
print $text;
sub best_n {
my ($word, $n) = #_;
my $item = $hash->{$word};
my #xlate = rev_nsort_by { $item->{$_} } keys %$item;
$n = $n > #xlate ? $#xlate : $n - 1;
#xlate[0..$n];
}
output
I love my [chat;félin;chatterie] and my [lapin] [cantine;ménagère] !

Related

Convert comma separated values to key value pair Perl

I have an array of states in the format
('AL','Alabama','AK','Alaska','AR','Arkansas'...)
which I want formatted like:
[{'AL' => 'Alabama'},...]
This is primarily so that I can more easily loop through using the HTML::Template module (https://metacpan.org/pod/HTML::Template#TMPL_LOOP)
I'm fairly new to perl, so unsure about how to do this sort of action and can't find something similar enough.
Wouldn't the following make more sense for HTML::Template?
states => [ { id => 'AL', name => 'Alabama' }, ... ]
This would allow you to use the following template:
<TMPL_LOOP NAME=states>
<TMPL_VAR NAME=name> (<TMPL_VAR NAME=id>)
</TMPL_LOOP>
To achieve that, you can use the following:
use List::Util 1.29 qw( pairmap );
states => [ pairmap { +{ id => $a, name => $b } } #states ]
That said, you're probably generating HTML.
<select name="state">
<TMPL_LOOP NAME=states>
<option value="<TMPL_VAR NAME=id_html>"><TMPL_VAR NAME=name_html></option>
</TMPL_LOOP>
</select>
To achieve that, you can use the following:
use List::Util 1.29 qw( pairmap );
{
my %escapes = (
'&' => '&',
'<' => '<',
'>' => '>',
'"' => '"',
"'" => ''',
);
sub text_to_html(_) { $_[0] =~ s/([&<>"'])/$escapes{$1}/rg }
}
states => [ pairmap { +{ id_html => $a, name_html => $b } } map text_to_html, #states ]
use List::Util 1.29;
#state_hashes = List::Util::pairmap { +{ $a => $b } } #states;
Unless you need to keep this hash around for later use I think that simply looping through the elements two at a time would be simpler. You can accomplish this type of looping easily with splice:
my #states = ('AL','Alabama','AK','Alaska','AR','Arkansas'...);
while (my ($code, $name) = splice(#states, 0, 2)) {
# operations here
}
Alternatively, you can use this same approach to create the data structure you want:
my #states = ('AL','Alabama','AK','Alaska','AR','Arkansas'...);
my #state_hashes = ();
while (my ($code, $name) = splice(#states, 0, 2)) {
push #state_hashes, { $code => $name };
}
# do w/e you want with #state_hashes
Note: splice will remove elements from #states
bundle_by from List::UtilsBy can easily create this format:
use strict;
use warnings;
use List::UtilsBy 'bundle_by';
my #states = ('AL', 'Alabama', 'AK', 'Alaska', 'AR', 'Arkansas', ... );
my #hashes = bundle_by { +{#_} } 2, #states;
map solution with a few perlish things
my #states = ('AL','Alabama','AK','Alaska','AR','Arkansas','VT','Vermont');
my %states;
map { $states{$states[$_]} = $states[$_+1] unless $_%2 } 0..$#states;

Accessing and modifying a nested hash based on a dot separated string

I have a string as input, say apple.mango.orange = 100
I also have a hash reference:
$inst = {
'banana' => 2,
'guava' => 3,
'apple' => {
'mango' => {
'orange' => 80
}
}
};
I want to modify the value of orange using the input string. Can someone please help me how I could do this?
I tried splitting the string into (key, value) pair. I then did the following on the key string:
my $key2 = "\$inst->{".$key."}";
$key2 =~ s/\./}->{/g;
$$key2 = $value;
This does not work as intended. Can someone help me out here? I have read the Perl FAQ about not using a variable value as variable but I am unable to think of an alternative.
You are building string that consists of (buggy) Perl code, but you never ask Perl to execute it. ...but that's not the right approach.
sub dive_val :lvalue {
my $p = \shift;
$p = \($$p->{$_}) for #_;
$$p
}
my #key = split /\./, "apple.mango.orange";
dive_val($inst, #key) = $value;
or
use Data::Diver qw( DiveVal );
my #key = split /\./, "apple.mango.orange";
DiveVal($inst, map \$_, #key) = $value;
Not only is a symbolic reference a very bad idea here, it doesn't even solve your problem. You're building an expression in $key2, and just jamming another dollar sign in front of its name won't make perl execute that code. For that you would need eval, which is another bad idea
You can install and use the Data::Diver module, which does exactly this sort of thing, or you can simply loop over the list of hash keys, picking up a new hash reference each time and assigning the value to the element with the last key
The biggest issue is actually parsing the incoming string into a list of keys and a value. This code implements a subroutine apply which applies the implied operation in the string to a nested hash. Unless you are confident of your data, it needs some error checking addingto make sure each of the keys in the list exists. The Data:;Dumper output is just to demonstrate the validity of the result
use strict;
use warnings 'all';
use Data::Dumper;
my $inst = { 'banana' => 2, 'guava' => 3, 'apple' => { 'mango' => { 'orange' => 80 } } };
my $s = 'apple.mango.orange = 100';
apply($s, $inst);
print Dumper $inst;
sub apply {
my ($operation, $data) = #_;
my ($keys, $val) = $operation =~ /([\w.]+)\s*=\s*(\d+)/;
my #keys = split /\./, $keys;
my $last = pop #keys;
my $hash = $data;
$hash = $hash->{$_} for #keys;
$hash->{$last} = $val;
}
output
$VAR1 = {
'banana' => 2,
'apple' => {
'mango' => {
'orange' => '100'
}
},
'guava' => 3
};

Perl Program Issue, how to print scalar and array values together of hash

I also faced the same issue and I used this solution. It helped a lot, but it is useful when all values are scalar but my program contains both array and scalar values. so I am able to print scalar values but unable to print array values. Please suggest what we need to add?
Code:
#!/grid/common/bin/perl
use warnings;
require ("file.pl");
while (my ($key, $val) = each %hash)
{
print "$key => $val\n";
}
Non-scalar values require dereferencing, otherwise you will just print out ARRAY(0xdeadbeef) or HASH(0xdeadbeef) with the memory addresses of those data structures.
Have a good read of Perl Data Structure Cookbook: perldoc perldsc
as well as Perl References: perldoc perlref
Since you did not provide your data, here is an example:
#!/usr/bin/env perl
use warnings;
use strict;
my %hash = ( foo => 'bar',
baz => [ 1, 2, 3 ],
qux => { a => 123, b => 234 }
);
while (my ($key, $val) = each %hash) {
my $ref_type = ref $val;
if ( not $ref_type ) {
# SCALAR VARIABLE
print "$key => $val\n";
next;
}
if ('ARRAY' eq $ref_type) {
print "$key => [ " . join(',', #$val) . " ]\n";
} elsif ('HASH' eq $ref_type) {
print "$key => {\n";
while (my ($k, $v) = each %$val) {
print " $k => $v\n";
}
print "}\n";
} else {
# Otherstuff...
die "Don't know how to handle data of type '$ref_type'";
}
}
Output
baz => [ 1,2,3 ]
qux => {
a => 123
b => 234
}
foo => bar
For more complicated structures, you will need to recurse.
Data::Printer is useful for dumping out complicated structures.

Adding each element in an array to a complex hash in Perl

I have an array with n number of elments. I want to add each of the elements to a complex hash, each as a key/value pair. If the number of elements were fixed, say three, I would do:
my %hash;
my #array = ("first", "second", "third");
$hash{$array[0]}{$array[1]}{$array[2]}++;
The structure I want to end up with, is this (printed with Data::Dumper):
$VAR1 = 'first';
$VAR2 = {
'second' => {
'third' => 1
};
But I am at loss at achieving the same structure when the number of elements in the array isn't fixed. Something with anonymous variables and iterating through the array, yes, but something like foreach #array{$hash{$_}++}; will obviously only make one entry per element, and not the desired structure. Help?
Something like this could build the structure you desire for N elements:
use strict;
use warnings;
use Data::Dumper;
my #array = qw(first second third four five six seven);
my $hash;
foreach my $key ( reverse #array ) {
$hash = { $key => $hash };
}
print Dumper $hash;
__END__
$VAR1 = {
'first' => {
'second' => {
'third' => {
'fourth' => {
'fifth' => {
'sixth' => {
'seventh' => undef
}
}
}
}
}
}
};
It is not clear what you really need this for. There may be a better solution if you explain your use-case a little more. Incrementing this structure doesn't appear to be very easy.
After playing around a little, you can increment by traversing the hash reference to the bottom then incrementing the value of the last element. It is not very pretty though :|
# incrementing
my $elem = $hash; # copy the reference
foreach my $key ( #array ) {
# found the bottom of the hash
unless ( $elem->{$key} && ref($elem->{$key}) ) {
$elem->{$key}++;
last;
}
# not at the bottom, move to the next level
$elem = $elem->{$key};
}
print Dumper $hash;
__END__
$VAR1 = {
'first' => {
'second' => {
'third' => {
'fourth' => {
'fifth' => {
'sixth' => {
'seventh' => 1
}
}
}
}
}
}
};
This is relatively simple if you maintain a current hash reference. This short program demonstrates
The first few steps make sure that each hash element exists and its value is a hash reference. $href is moved to the next level of hash at each stage. For the final element of the array, the latest hash level's element is incremented instead of being set to a hash reference.
Whether or not this data structure is the correct choice depends on what else you need to do with it once you have built it
use strict;
use warnings;
my %hash;
my #array = qw/ first second third fourth fifth /;
drill_hash(\%hash, #array);
use Data::Dump;
dd \%hash;
sub drill_hash {
my ($href, #list) = #_;
my $final = pop #list;
$href = $href->{$_} //= {} for #list;
++$href->{$final};
}
output
{
first => { second => { third => { fourth => { fifth => 1 } } } },
}
Update
Having understood your purpose, the simplest way to keep a count of occurrences of ngrams like that is to have a speficic hash key that is used to keep the count of the sequence of words so far.
This program uses the value _COUNT for that key, and you can see that, for example, {under}{a}{_COUNT} and {under}{a}{rock}{_COUNT} are both 1
use strict;
use warnings;
my %counts;
count_ngram(\%counts, qw/ under a /);
count_ngram(\%counts, qw/ a rock /);
count_ngram(\%counts, qw/ under a rock /);
count_ngram(\%counts, qw/ a tree /);
count_ngram(\%counts, qw/ under a tree /);
use Data::Dump;
dd \%counts;
sub count_ngram {
my ($href, #ngram) = #_;
my $final = pop #ngram;
$href = $href->{$_} //= {} for #ngram;
++$href->{$final}{_COUNT};
}
output
{
a => { rock => { _COUNT => 1 }, tree => { _COUNT => 1 } },
under => {
a => { _COUNT => 1, rock => { _COUNT => 1 }, tree => { _COUNT => 1 } },
},
}

Extracting CN's with LDAP?

I have this code
#!/usr/bin/perl
use warnings;
use strict;
use Net::LDAP;
use Data::Dumper;
my $dn="CN=...";
my $password="xxx";
my $ldap = Net::LDAP->new('example.com') or die "$#";
my $mesg = $ldap->bind($dn, password=>$password);
if ($mesg->code) { die "uuuu $mesg"; }
$mesg = $ldap->search(base => "dc=test,dc=example,dc=com", filter => "(name=LIST)",);
my $ref = $mesg->entry->get_value("member", asref => 1);
print Dumper $ref;
foreach my $string (#{$ref}) {
$string =~ /CN=(.+?),.*/;
print $1 . "\n";
}
which outputs the CN's using regular expressions:
aaaa
bbbb
cccc
...
Using Dumper can I see the structure
$VAR1 = [
'CN=aaaa,OU=test,DC=test,DC=example,DC=com',
'CN=bbbb,OU=test,DC=test,DC=example,DC=com',
'CN=cccc,OU=test,DC=test,DC=example,DC=com',
So I am wondering if there is a more "LDAP" way to extract these CN's, instead of using regular expressions?
Update:
Based on Javs answer this is the solution.
my $ref = $mesg->entry->get_value("member", asref => 1);
foreach my $string (#{$ref}) {
print ldap_explode_dn($string)->[0]{CN} . "\n";
}
You can:
use Net::LDAP::Util qw(ldap_explode_dn);
and use it on your attribute like this:
ldap_explode_dn($mesg->entry->get_value('member'));
to get this array of hashes:
$VAR1 = [
{
'CN' => 'aaaa'
},
{
'OU' => 'test'
},
{
'DC' => 'test'
},
{
'DC' => 'example'
},
{
'DC' => 'com'
}
];
You do realize that CN is usually an attribute in LDAP directories?
Why not just query for the attribute CN for all returned objects? Then no parsing required.