How do I find and count duplicate values in a perl hash - perl

I need to find the duplicate values in a perl hash and then output the key/value pair and associated dup count when that count is > 1.
(I could leave a code sample of what I've attempted but that would just result in mass confusion and possibly uncontrolled laughter, and I'm really hoping to make it through life with some semblance of self esteem.)
Hash key/value would look like the following:
%hash = qw('FHDJ-124H' => 'hostname1', 'HJDHUR-87878' => 'hostname2', 'HGHDJH-874673' => 'hostname1');
My desired output would be:
2 duplicates found for hostname1
FHDJ-124H
HGHDJH-874673
Using perl 5.6 on Solaris 10. Tightly controlled production environment where upgrading or loading perl mods is not allowed. (A change request for moving to 5.8 is about 6 months out).
Many thanks!

You need to iterate through the hash keys in your first hash (key/value) and accumulate the count of each item you find in another hash (value/count).
If you want to display the keys together with duplicated values, your second hash cannot be as simple as that, since for each duplicated value you will have a collection of keys (all of them having the same value). In this case, simply accumulate the key in an array, then count its elements. I.e., your second hash would be something like (value/[key1,key2,key3...])
my %hash = ( key1 => "one", key2 => "two", key3 => "one", key4 => "two", key5 => "one" );
my %counts = ();
foreach my $key (sort keys %hash) {
my $value = $hash{$key};
if (not exists $counts{$value}) {
$counts{$value} = [];
}
push $counts{$value}, $key;
};
Then iterate over $counts to output what you need when the count of elements in $counts{$value} > 1

This is what you are looking for
#!/usr/bin/perl
use strict;
use warnings;
my %hash = ('FHDJ-124H' => 'hostname1', 'HJDHUR-87878' => 'hostname2', 'HGHDJH-874673' => 'hostname1');
my %reverse;
while (my ($key, $value) = each %hash) {
push #{$reverse{$value}}, $key;
}
while (my ($key, $value) = each %reverse) {
next unless #$value > 1;
print scalar(#$value), " duplicates found \n #$value have the same key $key\n";
}

What about:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dump qw(dump);
my %h = (a=>'v1', b=>'v2', c=>'v1', d=>'v3', e=>'v3');
my %r;
while(my($k,$v)=each%h){
push #{$r{$v}}, {$k=>$v};
}
dump %r;
output:
(
"v1",
[{ c => "v1" }, { a => "v1" }],
"v2",
[{ b => "v2" }],
"v3",
[{ e => "v3" }, { d => "v3" }],
)

Well, off of the top of my head, you could do something like this:
my #values=sort(values(%hash));
my #doubles=();
my %counts=();
foreach my $i (0..$#values)
{
foreach my $j (($i+1)..$#values)
{
if($values[$i] eq $values[$j])
{
push #doubles,$values[$i];
$counts{$values[$i]}++;
}
}
}
foreach(#doubles)
{
print "$hash{$_}, $_, $counts{$_}\n";
}
This is a bit of a naive solution (that I haven't tested, yet), and I'm sure there's a faster and slicker way, but this should work.

Related

Perl: Add hash as sub hash to simple hash

I looked at the other two questions that seem to be about this, but they are a little obtuse and I can't relate them to what I want to do, which I think is simpler. I also think this will be a much clearer statement of a very common problem/task so I'm posting this for the benefit of others like me.
The Problem:
I have 3 files, each file a list of key=value pairs:
settings1.ini
key1=val1
key2=val2
key3=val3
settings2.ini
key1=val4
key2=val5
key3=val6
settings3.ini
key1=val7
key2=val8
key3=val9
No surprise, I want to read those key=value pairs into a hash to operate on them, so...
I have a hash of the filenames:
my %files = { file1 => 'settings1.ini'
, file2 => 'settings2.ini'
, file3 => 'settings3.ini'
};
I can iterate through the filenames like so:
foreach my $fkey (keys %files) {
say $files{$fkey};
}
Ok.
Now I want to add the list of key=value pairs from each file to the hash as a sub-hash under each respective 'top-level' filename key, such that I can iterate through them like so:
foreach my $fkey (keys %files) {
say "File: $files{$fkey}";
foreach my $vkey (keys $files{$fkey}) {
say " $vkey: $files{$fkey}{$vkey}";
}
}
In other words, I want to add a second level to the hash such that it goes from just being (in psuedo terms) a single layer list of values:
file1 => settings1.ini
file2 => settings2.ini
file3 => settings3.ini
to being a multi-layered list of values:
file1 => key1 => 'val1'
file1 => key2 => 'val2'
file1 => key3 => 'val3'
file2 => key1 => 'val4'
file2 => key2 => 'val5'
file2 => key3 => 'val6'
file3 => key1 => 'val7'
file3 => key2 => 'val8'
file3 => key3 => 'val9'
Where:
my $fkey = 'file2';
my $vkey = 'key3';
say $files{$fkey}{$vkey};
would print the value
'val6'
As a side note, I am trying to use File::Slurp to read in the key=value pairs. Doing this on a single level hash is fine:
my %new_hash = read_file($files{$fkey}) =~ m/^(\w+)=([^\r\n\*,]*)$/img;
but - to rephrase this whole problem - what I really want to do is 'graft' the new hash of key=value pairs onto the existing hash of filenames 'under' the top $file key as a 'child/branch' sub-hash.
Questions:
How do I do this, how do I build a multi-level hash one layer at a time like this?
Can I do this without having to pre-define the hash as multi-layered up front?
I use strict; and so I have seen the
Can't use string ("string") as a HASH ref while "strict refs" in use at script.pl line <lineno>.
which I don't fully understand...
Edit:
Thank you Timur Shtatland, Polar Bear and Dave Cross for your great answers. In mentally parsing your suggestions it occurred to me that I had slightly mislead you by being a little inconsistent in my original question. I apologize. I also think I see now why I saw the 'strict refs' error. I have made some changes.
Note that my first mention of the initial hash of filename is correct. The subsequent foreach examples looping through %files, however, were incorrect because I went from using file1 as the first file key to using settings1.ini as the first file key. I think this is why Perl threw the strict refs error - because I tried to change the key from the initial string to a hash_ref pointing to the sub-hash (or vice versa).
Have I understood that correctly?
There are several CPAN modules purposed for ini files. You should study what is available and choose what your heart desire.
Otherwise you can write your own code something in the spirit of following snippet
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my #files = qw(settings1.ini settings2.ini settings3.ini);
my %hash;
for my $file (#files) {
$hash{$file} = read_settings($file);
}
say Dumper(\%hash);
sub read_settings {
my $fname = shift;
my %hash;
open my $fh, '<', $fname
or die "Couldn't open $fname";
while( <$fh> ) {
chomp;
my($k,$v) = split '=';
$hash{$k} = $v
}
close $fh;
return \%hash;
}
Output
$VAR1 = {
'settings1.ini' => {
'key2' => 'val2',
'key1' => 'val1',
'key3' => 'val3'
},
'settings2.ini' => {
'key2' => 'val5',
'key1' => 'val4',
'key3' => 'val6'
},
'settings3.ini' => {
'key1' => 'val7',
'key2' => 'val8',
'key3' => 'val9'
}
};
To build the hash one layer at a time, use anonymous hashes. Each value of %files here is a reference to a hash, for example, for $files{'settings1.ini'}:
# read the data into %new_hash, then:
$files{'settings1.ini'} = { %new_hash }
You do not need to predefine the hash as multi-layered (as hash of hashes) upfront.
Also, avoid reinventing the wheel. Use Perl modules for common tasks, in this case consider something like Config::IniFiles for parsing *.ini files
SEE ALSO:
Anonymous hashes: perlreftut
Hashes of hashes: perldsc
Perl makes stuff like this ridiculously easy.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my %files;
# <> reads from the files given on the command line
# one line at a time.
while (<>) {
chomp;
my ($key, $val) = split /=/;
# $ARGV contains the name of the file that
# is currently being read.
$files{$ARGV}{$key} = $val;
}
say Dumper \%files;
Running this as:
$ perl readconf settings1.ini settings2.ini settings3.ini
Gives the following output:
$VAR1 = {
'settings3.ini' => {
'key2' => 'val8',
'key1' => 'val7',
'key3' => 'val9'
},
'settings2.ini' => {
'key3' => 'val6',
'key1' => 'val4',
'key2' => 'val5'
},
'settings1.ini' => {
'key3' => 'val3',
'key1' => 'val1',
'key2' => 'val2'
}
};

Split mutli-element string into hash

I know that one can easily split a string into a hash using map like this question How to do I split a string into hash keys with undef values?, or this Perl Monks thread. So, something like this very easily works:
my %table = map { chomp; split(/\t/) } <DATA>;
dd \%table;
__DATA__
#1245 banana
#3499 cherry
#5290 notebook
#2112 compact_disc
Of course, this would result in:
{
"#1245" => "banana",
"#2112" => "compact_disc",
"#3499" => "cherry",
"#5290" => "notebook",
}
If one had a more complicated table of data, though, and wanted to make a hash of arrays using the second column as the key, is this possible with map, or does one have to use the "longer" form:
my %table;
while(<DATA>) {
chomp(my #elems = split(/\t/));
$table{$elems[1]} = \#elems;
}
dd \%table;
__DATA__
shelf1 #1245 banana Dole
shelf1 #3499 cherry Acme
shelf2 #5290 notebook Staples
shelf3 #2112 compact_disc Mercury_Records
to make:
{
"#1245" => ["shelf1", "#1245", "banana", "Dole"],
"#2112" => ["shelf3", "#2112", "compact_disc", "Mercury_Records"],
"#3499" => ["shelf1", "#3499", "cherry", "Acme"],
"#5290" => ["shelf2", "#5290", "notebook", "Staples"],
}
I tried these two approaches, but neither seem to work, and I'm guessing it's not possible. But, just out of curiosity (and education) was wondering if one can do it a similar way.
my %table = map{ $_->[1] => #$_ } split(/\t/, <DATA>);
my %table = map{ split(/\t/); $_->[1], #$_ } <DATA>;
You can use map, but you need to move the split inside:
my %table = map { chomp; my #s = split /\t/; $s[1], \#s } <DATA>;

Define Perl while() = each {} loop with inline hash

I am trying to define an inline hash in a while each loop, my program is not throwing any errors but also doesn't execute the print statement. Is it possible to define an inline hash like below:
while (my (key, value) = each %{ (apple => "red", orange => "orange", grape => "purple")}) {
print "something";
}
or alternatively I can't while each loop to work if I directly call a sub in the each statement that returns a hash like the following:
sub returnsHash {
my %fruits = (
apple => "red",
orange => "orange",
grape => "purple",
);
return %fruits;
}
while (my (key, value) = each %{ returnsHash() }) {
print "something";
}
A list/comma operator in scalar context evaluates to the result of the last item evaluated in scalar context. That means
each %{ apple => "red", orange => "orange", grape => "purple" }
is equivalent to
each %{ "purple" }
This is what both of your snippets are doing, but it's undesired, and it's a strict violation. (Always use use strict; use warnings qw( all );!!!)
You are using a hash dereference (%{ ... }), but you have no hash, much less a reference to a hash that you can dereference. To build a hash and return a reference to the hash, use { ... }.
each %{ { apple => "red", orange => "orange", grape => "purple" } }
While that solves a problem, but just reveals another problem: You get an endless loop.
The iterator used by each, keys and values is associated with the hash, not the operator. Since you are creating a new hash each time through the loop, you are creating a new iterator each time through the loop, so you will always get the first element of the newly created hash, and your loop will never end.
Since you have no need to look up items by key, I don't see why you're using a hash at all. You could use the following instead:
for (
[ apple => "red" ],
[ orange => "orange" ],
[ grape => "purple" ],
) {
my ($key, $val) = #$_;
...
}
The following is how you'd write the above if you got the list from a sub.
use List::Util qw( pairs );
for (pairs(f())) {
my ($key, $val) = #$_;
...
}
Both of those create many arrays, though. Since there's no issue with being destructive, I would use the following which avoids the issue:
{
my #kvs = f();
while ( my ($key, $val) = splice(#kvs, 0, 2) ) {
...
}
}
You could also use the following, but I think many would be confused by it:
for (
my #kvs = f();
my ($key, $val) = splice(#kvs, 0, 2);
) {
...
}
That can't be done since each only works as intended with an actual variable, a hash or an array. See the synopsis in docs. Same goes for keys and values.
It is the same with the second attempt, where the function is also called anew in every iteration.
Note that a sensible thing to try would be %{ {...} } (and not %{ (...) }) since the thing inside %{} must be a hash reference. This applies to both attempts as the function returns a hash, whereby you get back a list of scalars. (This still wouldn't help, per the first statement.)
I am not sure what the need is for this, as a hash can be defined before the loop. Also, I'd like to suggest to carefully look at each before using it since it comes with complexities.
I take it that you want to iterate over key-value pairs of a dynamically created list of such pairs. Here is a way to do that using a custom iterator (which wraps the hash iterator used by each)
use warnings;
use strict;
use feature 'say';
my $hit = get_each_it(a => 1, b => 2, c => 3);
while (my ($k, $v) = $hit->()) {
say "$k => $v";
}
my ($k, $v) = $hit->(); # restarts the iterator
say "$k --> $v";
($k, $v) = $hit->(); # next (keeps state)
say "$k --> $v";
sub get_each_it {
my %h = #_;
return sub { return each %h }
}
The repeated and continued iteration (after the hash is exhausted or for individual calls) is a basic property of the hash iterator that each uses, and in doing that
So long as a given hash is unmodified you may rely on keys, values and each to repeatedly return the same order as each other.
Please study carefully how this works.
See this article on Perl.com about iterators, with a number of examples. A detailed discussion of iterators is given in Iterator module, along with a tutorial. I don't know the module well but docs are worth reading; each and every warning and caveat applies to each.
In case you don't need (or want) the capability to reset the iterator for continued iterations once the hash is exhausted, here is an alternative iterator from ikegami's comment using splice
sub get_each_it {
my #kv = #_;
return sub { return splice #kv, 0, 2 }
}
This doesn't get entangled with each and it also iterates in the order of the submitted list.
Note that by properties of a closure each code reference returned by the generator still retains its own iterator, which maintains its state when invoked from various pieces of code. Use with care.
Also see an introductory note on closure in perlfaq7
A possibility to do what you apparently want to do, would be using a for loop. The (not so) anonymous hash goes in the initialisation expression. The test expression is the assignment of each and the iteration expression stays empty.
#!/usr/bin/perl
use strict;
use warnings;
for (my %h = (
apple => 'red',
orange => 'orange',
grape => 'purple',
);
my ($key, $value) = each(%h);
) {
print("$key: $value\n");
}
This can actually be seen as a sort of a while loop with an initialisation local to the loop. %h is only in scope in the loop. So it's not anonymous to the for loop and can be used with each but ceases to exist after the loop is done.
The while executes the expression on each loop, a ref to the hash works.
sub returnsHash {
my %fruits = (
apple => "red",
orange => "orange",
grape => "purple",
);
return \%fruits;
}
my $f = returnsHash() ;
while ( my ($key, $value) = each %{ $f } ) {
print "$key => $value\n";
}
what is needed is a foreach:
use v5.36 ;
no warnings qw(experimental::for_list);
sub returnsHash {
my %fruits = (
apple => "red",
orange => "orange",
grape => "purple",
);
return %fruits;
}
foreach my ($key, $value) ( returnsHash() ) {
print "$key => $value\n";
}

Accessing and modifying a nested hash based on a dot separated string

I have a string as input, say apple.mango.orange = 100
I also have a hash reference:
$inst = {
'banana' => 2,
'guava' => 3,
'apple' => {
'mango' => {
'orange' => 80
}
}
};
I want to modify the value of orange using the input string. Can someone please help me how I could do this?
I tried splitting the string into (key, value) pair. I then did the following on the key string:
my $key2 = "\$inst->{".$key."}";
$key2 =~ s/\./}->{/g;
$$key2 = $value;
This does not work as intended. Can someone help me out here? I have read the Perl FAQ about not using a variable value as variable but I am unable to think of an alternative.
You are building string that consists of (buggy) Perl code, but you never ask Perl to execute it. ...but that's not the right approach.
sub dive_val :lvalue {
my $p = \shift;
$p = \($$p->{$_}) for #_;
$$p
}
my #key = split /\./, "apple.mango.orange";
dive_val($inst, #key) = $value;
or
use Data::Diver qw( DiveVal );
my #key = split /\./, "apple.mango.orange";
DiveVal($inst, map \$_, #key) = $value;
Not only is a symbolic reference a very bad idea here, it doesn't even solve your problem. You're building an expression in $key2, and just jamming another dollar sign in front of its name won't make perl execute that code. For that you would need eval, which is another bad idea
You can install and use the Data::Diver module, which does exactly this sort of thing, or you can simply loop over the list of hash keys, picking up a new hash reference each time and assigning the value to the element with the last key
The biggest issue is actually parsing the incoming string into a list of keys and a value. This code implements a subroutine apply which applies the implied operation in the string to a nested hash. Unless you are confident of your data, it needs some error checking addingto make sure each of the keys in the list exists. The Data:;Dumper output is just to demonstrate the validity of the result
use strict;
use warnings 'all';
use Data::Dumper;
my $inst = { 'banana' => 2, 'guava' => 3, 'apple' => { 'mango' => { 'orange' => 80 } } };
my $s = 'apple.mango.orange = 100';
apply($s, $inst);
print Dumper $inst;
sub apply {
my ($operation, $data) = #_;
my ($keys, $val) = $operation =~ /([\w.]+)\s*=\s*(\d+)/;
my #keys = split /\./, $keys;
my $last = pop #keys;
my $hash = $data;
$hash = $hash->{$_} for #keys;
$hash->{$last} = $val;
}
output
$VAR1 = {
'banana' => 2,
'apple' => {
'mango' => {
'orange' => '100'
}
},
'guava' => 3
};

Reversing a hash preserving Duplicate values

My existing hash is
my $volvg = {
'datavol' => 'oradatavg',
'archvol' => 'archvg',
'archvol1' => 'archvg',
'soevol' => 'soevg',
'redovol' => 'oradatavg'
};
I want to reverse the hash in following way
$vgvol = { 'oradatavg' => [
'datavol',
'redovol'
], 'archvg' => [
'archvol',
'archvol1
] 'soevg' => [
'soevol'
] };
Can someone help?
Reversing in-place is probably a bad idea, create a new hash, and then if you need assign it to the old one.
Below is one way to do it:
#!/usr/bin/perl
use strict;
use warnings;
my $volvg = {
'datavol' => 'oradatavg',
'archvol' => 'archvg',
'archvol1' => 'archvg',
'soevol' => 'soevg',
'redovol' => 'oradatavg'
};
my $reversed;
while( my( $k, $v)= each %$volvg)
{ # $reversed->{$v}||=[]; # not needed, see dgw's comment below
push #{$reversed->{$v}}, $k; # push the old key into the array
}
use DDP; p $reversed; # for checking the result
# you can also use Data::Dumper or other modules
What's a little unclear in Perl, is how to embed arrays into hashes. Because pretty fundamentally - you can't. There's no such thing as a hash of arrays.
But what there is, is a hash of array references. You can manipulate an array ref like this:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $array_ref = [ "one", "two" ];
print Dumper \$array_ref;
push ( #$array_ref, "three-ish" );
print Dumper \$array_ref;
my $hash_ref;
$hash_ref->{"carrot"} = $array_ref;
print Dumper \$hash_ref;
push ( #{$hash_ref->{"carrot"}}, "a new value" );
print Dumper \$hash_ref;
So hopefully you can see the basis of what you need for creating the data structure you're looking for. Either extract the values and keys you want, create an array ref and insert them into a hash. Or iterate, and use push.
Inverting a hash is a classic question/recipe from the Perl Cookbook. The solution is trivial when the values being inverted into keys are unique.
%rev_hash = reverse %hash ;
but, as you note, that's not often the case thus the explanations/solutions offered in other answers here are necessary. Once you understand references it's not too hard (++ #Sobrique for making this link).
The Cookbook and other perl resources sometimes recommend tie-ing the hash (e.g. c.f Tie::RefHash) to make working with the references easier.
If you are inverting a hash that itself contains references it can tricky if you have to go deep into the hash. Here's a simple example that should invert a hash that has an array reference as a value.
use DDP;
my $volvg = {
'datavol' => ['oradatavg', 'oradatavgpoo2'] ,
'archvol' => 'archvg',
'archvol1' => 'archvg',
'soevol' => 'soevg',
'redovol' => 'oradatavg' };
while ( ($k,$v) = each(%$volvg) ) {
if (ref $v) {
map { push #{$volvg_rev{$_}}, $k } #$v ;
}
else {
push #{$volvg_rev{$v}}, $k ;
}
}
p $volvg ;
print "----\n";
p %volvg_rev ;
Output:
\ {
archvol "archvg",
archvol1 "archvg",
datavol [
[0] "oradatavg",
[1] "oradatavgpoo2"
],
redovol "oradatavg",
soevol "soevg"
}
----
{
archvg [
[0] "archvol1",
[1] "archvol"
],
oradatavg [
[0] "redovol",
[1] "datavol"
],
oradatavgpoo2 [
[0] "datavol"
],
soevg [
[0] "soevol"
]
}
As an aside, Perl6 has some neat new methods for reverse-ing, inverting and flipping things around.