perl: getting value out of a hash using map - perl

It seems like I should be able to do this with map, but the actual details elude me.
I have a list of strings in an array, and either zero or one of them may have a hash value.
So instead of doing:
foreach $str ( #strings ) {
$val = $hash{$str} if $hash{$str};
}
Can this be replaced with a one-liner using map?

#values = grep { $_ } #hash{#strings};
to account for the fact that you only want true values.
Change this to
#values = grep { defined } #hash{#strings};
if you want to skip undefined values.

Sure, it'd be:
map { $val = $hash{$_} } #strings;
That is, each value of #strings is set in $_ in turn (instead of $str as in your foreach).
Of course, this doesn't do much, since you're not doing anything with the value of $val in your loop, and we aren't capturing the list returned by map.
If you're just trying to generate a list of values, that'd be:
#values = map { $hash{$_} } #strings;
But it's more concise to use a hash slice:
#values = #hash{#strings};
EDIT: As pointed out in the comments, if it's possible that #strings contains values that aren't keys in your hash, then #values will get undefs in those positions. If that's not what you want, see Hynek's answer for a solution.

I'm used to do it in this way:
#values = map { exists $hash{$_} ? $hash{$_} : () } #strings;
but I don't see anything wrong in this way
push #values, $hash{$_} for grep exists $hash{$_}, #strings;
or
#values = #hash{grep exists $hash{$_}, #strings};

map { defined $hash{$_} && ( $val = $hash{$_})} #strings;

Related

Perl list all keys in hash with identical values

If I have a colon-delimited file name FILE and I do:
cat FILE|perl -F: -lane 'my %hash = (); $hash{#F[0]} = #F[2]'
to assign the first and 3rd tokens as the key => value pairs for the hash..
1) Is that a sane way to assign key value pairs to a hash?
2) What is the simplest way to now find all keys with shared values and list them?
Assume FILE looks like:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male
Desired Output: Keys with value "Apple": Mike,Jared,David,Sam
Your example won't work as you want because the -n option puts a while loop around your one-line program, so the hash you declare is created and destoyed for every record in the file. You could get around that by not declaring the hash, and so making it a persistent package variable which will retain all values stored in it.
You can then write push #{ $hash{$F[2]} }, $F[0] but notice that it should be $F[0] etc. and not #F[0], and I have used push to create a list of column 1 values for each column 3 value instead of just a list of one-to-one values relating each column 1 value with its column 3 value.
To clarify, your method produces a hash looking like this, which has to be searched to produce the display that you want.
(
Beth => "Maize",
David => "Apple",
Don => "Corn",
Jared => "Apple",
Mike => "Apple",
Sam => "Apple",
)
while mine creates this, which as you can see is pretty much already in the form you want.
(
Apple => ["Mike", "Jared", "Sam", "David"],
Corn => ["Don"],
Maize => ["Beth"],
)
But I think this problem is a bit too big to be solved with a one-line Perl program. The solution below expects the path to the input file as a command-line parameter, like this
> perl prog.pl colons.csv
but it will default to myfile.csv if no file is specified.
use strict;
use warnings;
our #ARGV = 'myfile.csv' unless #ARGV;
my %data;
while (<>) {
my #fields = split /:/;
push #{ $data{$fields[2]} }, $fields[0];
}
while (my ($k, $v) = each %data) {
next unless #$v > 1;
printf qq{Keys with value "%s": %s\n}, $k, join ', ', #$v;
}
output
Keys with value "Apple": Mike, Jared, Sam, David
use strict;
use warnings;
open my $in, '<', 'in.txt';
my %data;
while(<$in>){
chomp;
my #split = split/:/;
$data{$split[0]} = $split[2];
}
my $query = 'Apple';
print "Keys with value $query = ";
foreach my $name (keys %data){
print "$name " if $data{$name} eq $query;
}
print "\n";
Arrays are used to hold list of values, so use an array.
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
END {
for my $fruit (keys %h) {
next if #{ $h{$fruit} } < 2;
print "$fruit: ", join(",", #{ $h{$fruit} });
}
}
' FILE
The END block is executed on exit. In it, we iterate over the keys of the hash. If the value of the current hash element is an array with only one element, it's skipped. Otherwise, we prints the key followed by contents of the array referenced by the hash element.
Here is another way:
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
}{
print "$_: ", join(",", #{ $h{$_} }) for grep { #{$h{$_}} > 1 } keys %h;
' file
We read each line and create hash of arrays using third column as key and first column as list of values for matching key. In the END block we iterate over our hash using grep and filter keys whose array count greater than 1 and print the key followed by array elements.
It doesn't have to be a one liner,
Good. It's not going to be...
Is that a sane way to assign key value pairs to a hash?
You're simply assigning the key value pairs as:
$hash{"key"} = "value";
Which is about as simple as it gets. There might be a way of doing it via map. However, the main issue I see is what should happen if you have duplicate keys.
Let's say your file looks like this:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male # Note this entry is here twice!
David:35:Wheat:Male # Note this entry is here twice!
Let's do a simple assignment loop:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
}
When you get to $hash{David}, it will first be set to Apple, but then you change the value to Wheat. There are four ways you can handle this:
Use whatever the last value is. No change in the loop.
Use the first value and ignore subsequent values. Simple enough to do.
If that happens, it's an error. Abort the program and report the error.
Keep all values.
This last one is the most interesting because it involves a reference to an array as the values for your hash:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = [] if not exists $hash{$name}; # I'm making this an array reference
push #{ $hash{$name} }, $category;
}
Now, each value in my hash is a reference to an array:
my #values = #{ $hash{David} ); # The values of David...
print "David is in categories " . join ( ", ", #values ) . "\n";
This will print out David is in categories Wheat, Apple
What is the simplest way to now find all keys with shared values and list them?
The easiest way is to create a second hash that's keyed by your value. In this hash, you will need to use an array reference. Let's assume no duplicate names for now:
my %hash;
my %indexed_hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
my $indexed_hash{$category} = [] if not exist $indexed_hash{$category};
push #{ $indexed_hash{$category} }, $name;
}
Now, if I want to find all the duplicates of Apple:
my #names = #{ $indexed_hash{Apple} };
print "The following are in 'Apple': " . join ( ", " #names ) . "\n";
Since we're getting into references, we could take things a step further and store all of your values of your file in your hash. Again, for simplicity, I am assuming that you will have one and only one entry per name:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name}->{AGE} = $age;
$hash{$name}->{CATEGORY} = $category;
$hash{$name}->{SEX} = $sex;
}
for my $name ( sort keys %hash ) {
print "$name Information:\n";
print " Age: " . $hash{$name}->{AGE} . "\n";
printf "Category: %s\n", $hash{$name}->{CATEGORY};
print " Sex: #{[$hash{$name}->{SEX}]}\n\n";
}
That last two statements are easier ways of interpolating complex data structures into a string. The printf is fairly clear. The second #{[...]} is a neat little trick.
What have you tried?
If you reverse the hash into a list of value => key pairs then use List::Util's pairs() against the list, you can transform the hash into a hash of values => key arrayrefs. i.e. ( foo => [ 'bar', 'baz' ] ), grep {#{$hash{$_}} > 1} keys %hash, and print the results.

Is there a simple way to validate a hash of hash element comparsion?

Is there a simple way to validate a hash of hash element comparsion ?
I need to validate a Perl hash of hash element $Table{$key1}{$key2}{K1}{Value} compare to all other elements in hash
third key will be k1 to kn and i want comprare those elements and other keys are same
if ($Table{$key1}{$key2}{K1}{Value} eq $Table{$key1}{$key2}{K2}{Value}
eq $Table{$key1}{$key2}{K3}{Value} )
{
#do whatever
}
Something like this may work:
use List::MoreUtils 'all';
my #keys = map "K$_", 1..10;
print "All keys equal"
if all { $Table{$key1}{$key2}{$keys[1]}{Value} eq $Table{$key1}{$key2}{$_}{Value} } #keys;
I would use Data::Dumper to help with a task like this, especially for a more general problem (where the third key is more arbitrary than 'K1'...'Kn'). Use Data::Dumper to stringify the data structures and then compare the strings.
use Data::Dumper;
# this line is needed to assure that hashes with the same keys output
# those keys in the same order.
$Data::Dumper::Sortkeys = 1;
my $string1= Data::Dumper->Dump($Table{$key1}{$key2}{k1});
for ($n=2; exists($Table{$key1}{$key2}{"k$n"}; $n++) {
my $string_n = Data::Dumper->Dump($Table{$key1}{$key2}{"k$n"});
if ($string1 ne $string_n) {
warn "key 'k$n' is different from 'k1'";
}
}
This can be used for the more general case where $Table{$key1}{$key2}{k7}{value} itself contains a complex data structure. When a difference is detected, though, it doesn't give you much help figuring out where that difference is.
A fairly complex structure. You should be looking into using object oriented programming techniques. That would greatly simplify your programming and the handling of these complex structures.
First of all, let's simplify a bit. When you say:
$Table{$key1}{$key2}{k1}{value}
Do you really mean:
my $value = $Table{$key1}->{$key2}->{k1};
or
my $actual_value = $Table{$key1}->{$key2}->{k1}->{Value};
I'm going to assume the first one. If I'm wrong, let me know, and I'll update my answer.
Let's simplify:
my %hash = %{$Table{$key1}->{$key2}};
Now, we're just dealing with a hash. There are two techniques you can use:
Sort the keys of this hash by value, then if two keys have the same value, they will be next to each other in the sorted list, making it easy to detect duplicates. The advantage is that all the duplicate keys would be printed together. The disadvantage is that this is a sort which takes time and resources.
Reverse the hash, so it's keyed by value and the value of that key is the key. If a key already exists, we know the other key has a duplicate value. This is faster than the first technique because no sorting is involved. However, duplicates will be detected, but not printed together.
Here's the first technique:
my %hash = %{$Table{$key1}->{$key2}};
my $previous_value;
my $previous_key;
foreach my $key (sort {$hash{$a} cmp $hash{$b}} keys %hash) {
if (defined $previous_key and $previous_value eq $hash{$key}) {
print "\$hash{$key} is a duplicate of \$hash{$previous_key}\n";
}
$previous_value = $hash{$key};
$previous_key = $key;
}
And the second:
my %hash = %{$Table{$key1}->{$key2}};
my %reverse_hash;
foreach $key (keys %hash) {
my $value = $hash{$key};
if (exists $reverse_hash{$value}) {
print "\$hash{$reverse_hash{$value}} has the same value as \$hash{$key}\n";
}
else {
$reverse_hash{$value} = $key;
}
}
Alternative approach to the problem is make utility function which will compare all keys if has same value returned from some function for all keys:
sub AllSame (&\%) {
my ($c, $h) = #_;
my #k = keys %$h;
my $ref;
$ref = $c->() for $h->{shift #k};
$ref ne $c->() and return for #$h{#k};
return 1
}
print "OK\n" if AllSame {$_->{Value}} %{$Table{$key1}{$key2}};
But if you start thinking in this way you can found this approach much more generic (recommended way):
sub AllSame (#) {
my $ref = shift;
$ref ne $_ and return for #_;
return 1
}
print "OK\n" if AllSame map {$_->{Value}} values %{$Table{$key1}{$key2}};
If mapping operation is expensive you can make lazy counterpart of same:
sub AllSameMap (&#) {
my $c = shift;
my $ref;
$ref = $c->() for shift;
$ref ne $c->() and return for #_;
return 1
}
print "OK\n" if AllSameMap {$_->{Value}} values %{$Table{$key1}{$key2}};
If you want only some subset of keys you can use hash slice syntax e.g.:
print "OK\n" if AllSame map {$_->{Value}} #{$Table{$key1}{$key2}}{map "K$_", 1..10};

What's the best practise for Perl hashes with array values?

What is the best practise to solve this?
if (... )
{
push (#{$hash{'key'}}, #array ) ;
}
else
{
$hash{'key'} ="";
}
Is that bad practise for storing one element is array or one is just double quote in hash?
I'm not sure I understand your question, but I'll answer it literally as asked for now...
my #array = (1, 2, 3, 4);
my $arrayRef = \#array; # alternatively: my $arrayRef = [1, 2, 3, 4];
my %hash;
$hash{'key'} = $arrayRef; # or again: $hash{'key'} = [1, 2, 3, 4]; or $hash{'key'} = \#array;
The crux of the problem is that arrays or hashes take scalar values... so you need to take a reference to your array or hash and use that as the value.
See perlref and perlreftut for more information.
EDIT: Yes, you can add empty strings as values for some keys and references (to arrays or hashes, or even scalars, typeglobs/filehandles, or other scalars. Either way) for other keys. They're all still scalars.
You'll want to look at the ref function for figuring out how to disambiguate between the reference types and normal scalars.
It's probably simpler to use explicit array references:
my $arr_ref = \#array;
$hash{'key'} = $arr_ref;
Actually, doing the above and using push result in the same data structure:
my #array = qw/ one two three four five /;
my $arr_ref = \#array;
my %hash;
my %hash2;
$hash{'key'} = $arr_ref;
print Dumper \%hash;
push #{$hash2{'key'}}, #array;
print Dumper \%hash2;
This gives:
$VAR1 = {
'key' => [
'one',
'two',
'three',
'four',
'five'
]
};
$VAR1 = {
'key' => [
'one',
'two',
'three',
'four',
'five'
]
};
Using explicit array references uses fewer characters and is easier to read than the push #{$hash{'key'}}, #array construct, IMO.
Edit: For your else{} block, it's probably less than ideal to assign an empty string. It would be a lot easier to just skip the if-else construct and, later on when you're accessing values in the hash, to do a if( defined( $hash{'key'} ) ) check. That's a lot closer to standard Perl idiom, and you don't waste memory storing empty strings in your hash.
Instead, you'll have to use ref() to find out what kind of data you have in your value, and that is less clear than just doing a defined-ness check.
I'm not sure what your goal is, but there are several things to consider.
First, if you are going to store an array, do you want to store a reference to the original value or a copy of the original values? In either case, I prefer to avoid the dereferencing syntax and take references when I can:
$hash{key} = \#array; # just a reference
use Clone; # or a similar module
$hash{key} = clone( \#array );
Next, do you want to add to the values that exist already, even if it's a single value? If you are going to have array values, I'd make all the values arrays even if you have a single element. Then you don't have to decide what to do and you remove a special case:
$hash{key} = [] unless defined $hash{key};
push #{ $hash{key} }, #values;
That might be your "best practice" answer, which is often the technique that removes as many special cases and extra logic as possible. When I do this sort of thing in a module, I typically have a add_value method that encapsulates this magic where I don't have to see it or type it more than once.
If you already have a non-reference value in the hash key, that's easy to fix too:
if( defined $hash{key} and ! ref $hash{key} ) {
$hash{key} = [ $hash{key} ];
}
If you already have non-array reference values that you want to be in the array, you do something similar. Maybe you want an anonymous hash to be one of the array elements:
if( defined $hash{key} and ref $hash{key} eq ref {} ) {
$hash{key} = [ $hash{key} ];
}
Dealing with the revised notation:
if (... )
{
push (#{$hash{'key'}}, #array);
}
else
{
$hash{'key'} = "";
}
we can immediately tell that you are not following the standard advice that protects novices (and experts!) from their own mistakes. You're using a symbolic reference, which is not a good idea.
use strict;
use warnings;
my %hash = ( key => "value" );
my #array = ( 1, "abc", 2 );
my #value = ( 22, 23, 24 );
push(#{$hash{'key'}}, #array);
foreach my $key (sort keys %hash) { print "$key = $hash{$key}\n"; }
foreach my $value (#array) { print "array $value\n"; }
foreach my $value (#value) { print "value $value\n"; }
This does not run:
Can't use string ("value") as an ARRAY ref while "strict refs" in use at xx.pl line 8.
I'm not sure I can work out what you were trying to achieve. Even if you remove the 'use strict;' warning, the code shown does not detect a change from the push operation.
use warnings;
my %hash = ( key => "value" );
my #array = ( 1, "abc", 2 );
my #value = ( 22, 23, 24 );
push #{$hash{'key'}}, #array;
foreach my $key (sort keys %hash) { print "$key = $hash{$key}\n"; }
foreach my $value (#array) { print "array $value\n"; }
foreach my $value (#value) { print "value $value\n"; }
foreach my $value (#{$hash{'key'}}) { print "h_key $value\n"; }
push #value, #array;
foreach my $key (sort keys %hash) { print "$key = $hash{$key}\n"; }
foreach my $value (#array) { print "array $value\n"; }
foreach my $value (#value) { print "value $value\n"; }
Output:
key = value
array 1
array abc
array 2
value 22
value 23
value 24
h_key 1
h_key abc
h_key 2
key = value
array 1
array abc
array 2
value 22
value 23
value 24
value 1
value abc
value 2
I'm not sure what is going on there.
If your problem is how do you replace a empty string value you had stored before with an array onto which you can push your values, this might be the best way to do it:
if ( ... ) {
my $r = \$hash{ $key }; # $hash{ $key } autoviv-ed
$$r = [] unless ref $$r;
push #$$r, #values;
}
else {
$hash{ $key } = "";
}
I avoid multiple hash look-ups by saving a copy of the auto-vivified slot.
Note the code relies on a scalar or an array being the entire universe of things stored in %hash.

Iterate through hash values in perl

I've got a hash with let's say 20 values.
It's initialized this way:
my $line = $_[0]->[0];
foreach my $value ($line) {
print $value;
}
Now when I try to get the value of each hash in $line it says:
Use of uninitialized value in print at file.pl line 89
Is there a way to iterate through each value of a hash?
I also tried it with:
my %line = $_[0]->[0];
foreach my $key (keys %line) {
print %line->{$key};
}
But that is also not working:
Reference found where even-sized list expected at file.pl at line 89
Anybody knows what to do? It shouldn't be that difficult...
To iterate over values in a hash:
for my $value (values %hash) {
print $value;
}
$line in your first example is a scalar, not a hash.
If it's a hash reference, dereference it with %{$line}.
First, you must understand the difference between a hash, and a hash reference.
Your initial assignment $_[0]->[0] means something like : Takes the first argument of the current function ($_[0]), dereference it (->) and consider it is an array and retrieves it's first value ([0]). That value can not be a list or a hash, it must be a scalar (string, int, float, reference).
Here is some example:
my %hash = ( MyKey => "MyValue");
my $hashref = \%hash;
# The next line print all elements of %hash
foreach (keys %hash) { print $_ }
# And is equivalent to
foreach (keys %{$hashref}) { print $_ }
$hash{MyKey} == $hashref->{MyKey}; # is true
Please refer to http://perldoc.perl.org/perlreftut.html for further details.
The warning is telling you that there nothing at $_[0]->[0]. It's not dying and telling you that you're indexing nothing, so $_[0] is likely an arrayref, but nothing is in the first slot--or perhaps it's pointing to an empty array.
Were it a empty string or a 0, it wouldn't complain.
Were there any reference there, you could print something even if only: BLAH(0x80af74). (Where "BLAH" is one of "ARRAY", "HASH", "SCALAR", "REF", "GLOB", "IO", ... )
My suggestion is that you do this:
use Data::Dumper;
say Data::Dumper->Dump( [ $_[0] ] ); # or even say Data::Dumper->Dump( [ \#_ ] )
and then look at the output.
Once you've got a hashref at $_[0]->[0], then if you must loop through the hash, the best way is:
while ( my ( $key, $value ) = each %$hashref ) {
do_stuff_with_key_and_value( $key, $value );
}
see each
Lastly, it seems that you have some sigil confusion. See the last part of this link for a decent attempt to explain that sigils ( '$', '#', '%' ) are not part of the name of a variable, but indicators what we want retrieved from it. Perl compilation woes

Selectively counting delimited field values and creating a hash using map

I have a pipe delimited text file containing, among other things, a date and a number indicating the lines sequence elsewhere in the program. What I'm hoping to do is from that file create a hash using the year as the key and the value being the maximum sequence for that year (I essentially need to implement an auto-incremented key per year) e.g from
2000|1
2003|9
2000|5
2000|21
2003|4
I would end with a hash like:
%hash = {
2000 => 21,
2003 => 9
}
I've managed to split the file into the year and sequence parts (not very well I think) like so:
my #dates = map {
my #temp = split /\|/;
join "|", (split /\//, $temp[1])[-1], $temp[4] || 0; #0 because some records
#mightn't have a sequence
} #info
Is there something succint I could do to create a hash using that data?
Thanks
If I understand you, you were almost there. All you needed to do was return the key and value from map and sort by sequence instead of joining them.
my %hash =
map #$_,
sort { $a->[1] <=> $b->[1] }
map {
my #temp = split /\|/;
my $date = (split /\//, $temp[1])[-1];
my $seq = $temp[4] || 0; #0 because some records mightn't have a sequence
[ $date, $seq ]
} #info;
But just iterating through with for and setting hash only if the current sequence
is higher than the previous maximum for that date is probably a better idea.
Be careful with those {}; where you said
%hash = {
2000 => 21,
2003 => 9
}
you meant () instead (or to be assigning to a reference $hash), since the {} there create an anonymous hash and return a reference to it.
Here's how you could write that .. not too sure why you want/need to use map (please explain)
#!/usr/bin/perl -w
use strict;
use warnings;
my %hash;
while(<DATA>) {
chomp();
my ($year,$sequence)=split('\|');
$sequence = 0 unless (defined ($sequence));
next if (exists $hash{$year} and $sequence < $hash{$year});
$hash{$year}=$sequence;
}
__DATA__
2000|1
2003|9
2000|5
2000|21
2003|4
I added the $sequence = 0 unless defined ($sequence); because of that comment in your snippet. I believe I might understand your intent there.. (either the input format is valid/consistent, or it is not ..)
map operates on each item in a list and builds a list of results to pass on. So, you can't really do the sort of checks you want (keep the maximum sequence value) as you go unless you build a scratch hash that winds up containing exactly the data you are trying to build as the return value of the `map.
my %results = map {
my( $y, $s ) = split '[|]', $_;
seq_is_gt_year_seq( $y, $s )
? ( $y, $s )
: ();
} #year_pipe_seq;
To implement seq_is_gt_year_seq() we wind up having to build a temporary hash that stores each year and its max sequence value for lookup.
You should use an approach that builds the lookup incrementally, like a for or while loop.
map { BLOCK } LIST always usually (unless BLOCK sometimes evaluates to an empty list) returns a list that is least as large as LIST, and may not be the way to go if you do want to simply overwrite duplicate keys with the latest data. Something like:
my %hash;
for (#info) {
my #temp = split /\|/;
my $key = (split /\//, $temp[1]);
my $value = $temp[4] || 0;
$hash{$key} = $value unless defined $hash{$key} && $hash{$key}>=$value;
}
will work. The last line conditionally updates the hash table, which is something you can't do (or at least can't do very conveniently) inside a map statement.
If there's any chance you can perform this processing as the file is read, then I'd do it. Something like this:
my %year_count;
while (my $line = <$fh>){
chomp $line;
my ($year, $num) = split /\|/, $line;
if ($num > $year_count{$year} || !defined $year_count{$year})
$year_count{$year} = $num;
}
}
if you want to use an array, map isn't really the best choice (since you're not transforming the list, you're processing it down to something different). To be honest the most sensible array-processing would probably be the same as the above, but in a foreach instead:
my %year_count;
foreach my $line (#info){
my ($year, $num) = split /\|/, $line;
if ($num > $year_count{$year} || !defined $year_count{$year})
$year_count{$year} = $num;
}
}