Quickly filter a perl hash of hashes - perl

I have a perl hash of hashes like the following:
$VAR1 = {
'ID_1' => {
'FILE_B' => '/path/to/file/file1',
'FILE_C' => '/path/to/file/file2',
'FILE_A' => '/path/to/file/file3'
},
'ID_2' => {
'FILE_B' => '/path/to/file/file4',
'FILE_A' => '/path/to/file/file5'
},
'ID_3' => {
'FILE_B' => '/path/to/file/file6',
'FILE_A' => '/path/to/file/file7'
}
...
}
I would like to get a list of all keys of members in the main hash that have FILE_C defined. In the example, this will return only ID_1.
I know how to do this in a cumbersome loop (iterating all keys, checking if FILE_C is defined, if so — pushing the key to an array, finally returning this array), but I have a feeling there's a single-liner or even a function for this …

Yep, perl has the grep function:
my #keys = grep { defined $hash{$_}{FILE_C} } keys %hash;

Related

How to merge two hashes returned from sub, in Perl

I am trying to merge two hashes. But they are return values from functions. How can I dereference the return values inline? I don't want to use extra variables such as my $pos = makePos();
use v5.8.8;
use strict;
use warnings;
sub _makePos
{
my $desc= {
pos50 => {unit => 'm', desc => 'position error' },
pos95 => {unit => 'm', desc => '95% position error' }
};
return $desc;
}
sub _makeVel
{
my $desc= {
vel50 => {unit => 'm/s', desc => 'velocity error' },
vel95 => {unit => 'm/s', desc => '95% velocity error' }
};
return $desc;
}
my $descs = {_makePos(), _makeVel()};
use Data::Dumper;
print Dumper($descs);
this prints only the hash returned from _makeVel. how does it work?
$VAR1 = {
'HASH(0x21ea4a0)' => {
'vel50' => {
'desc' => 'velocity error',
'unit' => 'm/s'
},
'vel95' => {
'unit' => 'm/s',
'desc' => '95% velocity error'
}
}
};
changing this line as
my $descs = {%{_makePos()}, %{_makeVel()}};
worked!
Actually, your original solution did print both of the hashes, but the first one was "stringified" as it was being used as the key of your hash. It's there as HASH(0x21ea4a0).
I see you have a solution, but it might be worth explaining what was going wrong and why your solution fixed it.
Your two subroutines don't return hashes but, rather, hash references. A hash reference is a scalar value that is, effectively, a pointer to a hash.
A hash is created from a list of values. Your code that creates your news hash (actually, once again, a hash reference) is this:
my $descs = {_makePos(), _makeVel()};
This is two scalar values. The first is used as a key in the new hash and the second is used as the associated value - hence the results you get from Data::Dumper.
What you actually want to do is to "dereference" your hashes and get back to the actual hashes. You can dereference a hash using the syntax %{ ... }, where the ... is any expression returning a hash reference. And that's what you've done in your solution. You dereference the hash references, which gives you a list of key/value pairs. The pairs from the two dereferenced hashes are then joined together in a single list which is used to create your new, combined, hash.
I should point out that there's a danger in this approach. If your two subroutines can ever return references to hashes that contain the same key, then only one version of that repeated key will appear in the combined hash.

How do I access the hash represented by this dumper output?

I am hacking a git-svn Perl script. I have a $paths variable which I think contains an array of individual paths, but I am having a hard time iterating over it. My end goal is to to add an additional attribute to one path.
Here is the dumper output.
{
"/dira" => {
action => "A",
copyfrom_path => undef,
copyfrom_rev => -1
},
"/dira/dirb" => {
action => "A",
copyfrom_path => undef,
copyfrom_rev => -1
},
"/dira/dirb/test.55mb.file" => {
action => "A",
copyfrom_path => undef,
copyfrom_rev => -1
},
}
According to that output, $paths is a reference to a hash of references to hashes.
If you know which path you want to extend, you don't need to iterate:
$paths->{'/foo/bar'}{'my_attribute'} = 42;
If you want to do this uniformly to all paths, you can do this:
for my $attrs (values %$paths) {
$attrs->{'my_attribute'} = 42;
}
See perldoc perldata for information about hashes and perldoc perlreftut for references and nested data structures.

Iterate over a nested hash in perl

I have to iterate over a nested hash in perl and carry out some operations. The structure I have is
$featureGroup = [
{
featureType => "widget",
name => "dpx-shadow-fleet",
parameterMap => { dpxContext => "shadowAtf", dpxEndPoint => "/art/dp/ppd?" },
},
{
featureType => "widget",
name => "dpx-shadow-fleet",
parameterMap => { dpxContext => "shadowBtf", dpxEndPoint => "/art/dp/btf?" },
},
{
features => [
{
featuredesc => [
{
critical => 1,
featureType => "widget",
name => "dpx-ppd",
parameterMap => { dpxContext => "atf", dpxEndPoint => "/art/dp/" },
},
{
featureType => "widget",
name => "error",
parameterMap => { errorMessageId => "error" },
},
],
featureType => "sequence",
},
{
critical => 1,
features => ["encode-landing-image", "image-encoding-error"],
featureType => "sequence",
},
],
handler => "/gp/product/features/embed-landing-image.mi",
name => "embed-landing-image",
pfMetrics => { "" => undef, "start" => sub { "DUMMY" }, "stop" => sub { "DUMMY" } },
type => "custom-grid",
},
];
I want to iterate over the featuredesc subarray and get the value name. I am trying out this.
for(my $i = 0; $i < #$featureGroup; $i++){
if(defined $featureGroup->[$i]->{'features'}){
for(my $j = 0; $j < #$featureGroup->[$i]->{'features'} ; $j++){
print "$featureGroup->[$i]->{'features'}->{'featuredesc}->{name}";
}
}
}
But this is not working. I am not understanding where am I going wrong. Any pointers in the right direction would be useful.
You have a very complex data object there and you have already encountered problems dealing with it. While I could help you address your direct problem, I think you would benefit more from learning how to reduce the complexity.
Perl supports Object Oriented programming. This allows you to take data structures and attach subroutines to them that operate on them. You can read about Perl OO here. I will show you quickly how you can turn the $featureGroup list into a list of objects, and how to access the features that a single object contains. You should apply this technique to every hash in your datastructure (you can tone it back if you are sure that certain inner hashes should not be objects, but it is probably better to start by overdoing it and then scale back rather than the other way around).
This is one of the feature group hashes:
{
'featureType' => 'widget',
'name' => 'dpx-shadow-fleet',
'parameterMap' => {
'dpxContext' => 'shadowAtf',
'dpxEndPoint' => '/art/dp/ppd?'
}
}
In this one you have a featureType, name, and parameterMap. These fields do not appear in every object in your list (in fact the last hash looks quite different to the first two). I will show you how to create an object which requires those three parameters:
package Feature;
use Moose; # You may have to install this
has 'featureType' => (
'is' => 'rw',
'isa' => 'Str'
);
has 'name' => (
'is' => 'rw',
'isa' => 'Str'
);
has 'parameterMap' => (
'is' => 'rw',
'isa' => 'HashRef'
# You could make this accept another object type
# if you convert this inner hash
);
You can then construct your object like so:
my $f = new Feature(
'featureType' => 'widget',
'name' => 'dpx-shadow-fleet',
'parameterMap' => {
'dpxContext' => 'shadowAtf',
'dpxEndPoint' => '/art/dp/ppd?'
}
);
You are then able to access those fields by using the named accessors:
print $f->name; # dpx-shadow-fleet
At the moment this just seems like a longer way to use a hash, right? Well the real benefit comes from being able to define arbitrary subroutines on the class which hide complexity from the caller. So you want to operate on the features array in your original question. Lets define that as a field:
has features => (
is => 'rw',
isa => 'ArrayRef[HashRef]'
# This is an array containing hashes
# You _really_ want to turn the inner hashes into an object here!
);
Then we can operate on them in another subroutine. Lets define one that returns every feature that is a sequence (has a featureType of sequence):
sub get_sequences {
my ($self) = #_;
return grep { $_->{featureType} eq 'sequence' } #{ $self->features };
}
Now when you use an object of this type to get the sequence features all you need to do is:
$f->get_sequences();
If you apply this to all levels of your hash you will find that your code becomes easier to manage. Good luck!
Try this:
for(my $i = 0; $i < #$featureGroup; $i++){
if(defined $featureGroup->[$i]->{'features'}){
for(my $j = 0; $j<scalar #{$featureGroup->[$i]->{'features'}} ; $j++){
for(my $k=0;$k<scalar #{$featureGroup->[$i]->{'features'}->[$j]->{'featuredesc'}};$k++) {
if (defined $featureGroup->[$i]->{'features'}->[$j]->{'featuredesc'}->[$k]->{'name'}) {
print $featureGroup->[$i]->{'features'}->[$j]->{'featuredesc'}->[$k]->{'name'}."\n";
}
}
last if !defined $featureGroup->[$i+1]->{'features'};
}
}
}
Instead of iterated by index, I'd advise that you iterate by element.
This enables one to easily filter each step using grep or next
for my $group (grep {$_->{features}} #$featureGroup) {
for my $feature (grep {$_->{featuredesc}} #{$group->{features}}) {
for my $desc (#{$feature->{featuredesc}}) {
print "$desc->{name}\n"
}
}
}
Outputs:
dpx-ppd
error

Problems with sorting a hash of hashes by value in Perl

I'm rather inexperienced with hashes of hashes - so I hope someone can help a newbie...
I have the following multi-level hash:
$OCRsimilar{$ifocus}{$theWord}{"form"} = $theWord;
$OCRsimilar{$ifocus}{$theWord}{"score"} = $OCRscore;
$OCRsimilar{$ifocus}{$theWord}{"distance"} = $distance;
$OCRsimilar{$ifocus}{$theWord}{"similarity"} = $similarity;
$OCRsimilar{$ifocus}{$theWord}{"length"} = $ilength;
$OCRsimilar{$ifocus}{$theWord}{"frequency"} = $OCRHashDict{$ikey}{$theWord};
Later, I need to sort each second-level element ($theWord) according to the score value. I've tried various things, but have failed so far. The problem seems to be that the sorting introduces new empty elements in the hash that mess things up.
What I have done (for example - I'm sure this is far from ideal):
my #flat = ();
foreach my $key1 (keys { $OCRsimilar{$ifocus} }) {
push #flat, [$key1, $OCRsimilar{$ifocus}{$key1}{'score'}];
}
for my $entry (sort { $b->[1] <=> $a->[1] } #flat) {
print STDERR "#$entry[0]\t#$entry[1]\n";
}
If I check things with Data::Dumper, the hash contains for example this:
'uroadcast' => {
'HASH(0x7f9739202b08)' => {},
'broadcast' => {
'frequency' => '44',
'length' => 9,
'score' => '26.4893274374278',
'form' => 'broadcast',
'distance' => 1,
'similarity' => 1
}
}
If I don't do the sorting, the hash is fine. What's going on?
Thanks in advance for any kind of pointers...!
Just tell sort what to sort on. No other tricks are needed.
#!/usr/bin/perl
use warnings;
use strict;
my %OCRsimilar = (
focus => {
word => {
form => 'word',
score => .2,
distance => 1,
similarity => 1,
length => 4,
frequency => 22,
},
another => {
form => 'another',
score => .01,
distance => 1,
similarity => 1,
length => 7,
frequency => 3,
},
});
for my $word (sort { $OCRsimilar{focus}{$a}{score} <=> $OCRsimilar{focus}{$b}{score} }
keys %{ $OCRsimilar{focus} }
) {
print "$word: $OCRsimilar{focus}{$word}{score}\n";
}
Pointers: perlreftut, perlref, sort.
What seems suspicious to me is this construct:
foreach my $key1 (keys { $OCRsimilar{$ifocus} }) {
Try dereferencing the hash, so it becomes:
foreach my $key1 (keys %{ $OCRsimilar{$ifocus} }) {
Otherwise, you seem to be creating an anonymous hash and taking the keys of it, equivalent to this code:
foreach my $key1 (keys { $OCRsimilar{$ifocus} => undef }) {
Thus, I think $key1 would equal $OCRsimilar{$ifocus} inside the loop. Then, I think Perl will do auto-vivification when it encounters $OCRsimilar{$ifocus}{$key1}, adding the hash reference $OCRsimilar{$ifocus} as a new key to itself.
If you use warnings;, the program ought to complain Odd number of elements in anonymous hash.
Still, I don't understand why Perl doesn't do further auto-vivication and add 'score' as the key, showing something like 'HASH(0x7f9739202b08)' => { 'score' => undef }, in the Data dump.

in perl ,how to use variable value as hash element

I am new to Perl, and can't find the answer to the question in the Learning Perl book.
For example I have a array like:
my #loop=("op1_sel","op2_sel");
and two hash table as:
my %op1_sel=(
"bibuf","000",
"self","101"
);
my %op2_sel=(
"zero","1",
"temp","0"
);
Now I want to use variables in the loop to loop for the hash table for a particular key
for example:
foreach(#loop)
{
print ${$_}{"bibuf"} ;
}
But it seems not working, I know the ${$_} part is wrong, can anyone can tell me how
to fix this ?
Use nested hashes. Like this:
my %op;
# put a hash reference into hash, twice
$op{op1_sel} = \%op1_sel;
$op{op2_sel} = \%op2_sel;
# later ...
foreach (keys %op) {
print "bibuf of $_: $op{$_}->{bibuf}\n";
};
Or, long story short, just
my %op = (
op1_sel => {
foo => 1,
bar => 2,
# ...
},
op2_sel => {
# ...
},
};
The {} construct creates a reference to anonymous hash and is the standard way of handling nested data structures.
See also perldoc perldsc.
You can't refer to lexical (my) variables using the ${$foo} syntax. You could probably make it work if they were package variables, but this would not be the right way to go about it.
The right way to do it is using a nested data structure.
I can see two obvious ways of doing it. You could either make an array of op_sel containing the inner hashes directly, or create a hash of hashes, and then index into that.
So "array of hashes":
my #op_sels = (
{
bibuf => '000',
self => '101',
},
{
zero => '1',
temp => '0',
},
);
for my $op (#op_sels) {
print $$op{bibuf};
}
and "hash of hashes":
my %op_sels = (
1 => {
bibuf => '000',
self => '101',
},
2 => {
zero => '1',
temp => '0',
},
);
for my $op_key (sort keys %op_sels) {
print $op_sels{$op_key}{bibuf};
}
You can use eval for this.
foreach(#loop)
{
eval "\%var = \%$_";
print $var{"bibuf"} ;
}