Hash content extraction based on condition - perl

I have a hash containing node data.
I am expecting hash content to be printed in -r_<count> and -d_<count> attributes.
Here is the script:
use strict; use warnings;
use Data::Dumper;
my %hash = (
'Network=Test,Cell=31' => [ 'Network=Test,Unit=RU-1-1,Port=A',
'Network=Test,Unit=RU-1-2,Port=A'
],
'Network=Test,Cell=32' => [ 'Network=Test,Unit=RU-1-1,Port=A',
'Network=Test,Unit=RU-1-2,Port=A'
],
'Network=Test,Cell=33' => [ 'Network=Test,Unit=RU-1-5,Port=A',
'Network=Test,Unit=RU-1-6,Port=A'
],
);
print "hash:\n".Dumper(\%hash);
my $count = 0;
foreach my $d (sort keys %hash) {
$count++;
print "-d_". $count."=".$d . "\n";
my %seen = ();
foreach my $r (sort #{$hash{$d}}) {
$seen{$r}++;
}
if ((keys %seen) > 0) {
my $uniq = join ("###",sort keys %seen);
print "-r_". $count . "=" . $uniq . "\n";
} else {
print "-r_". $count."="."NA\n";
}
}
And I am able to print output like below(current output):
-d_1=Network=Test,Cell=31
-r_1=Network=Test,Unit=RU-1-1,Port=A###Network=Test,Unit=RU-1-2,Port=A
-d_2=Network=Test,Cell=32
-r_2=Network=Test,Unit=RU-1-1,Port=A###Network=Test,Unit=RU-1-2,Port=A
-d_3=Network=Test,Cell=33
-r_3=Network=Test,Unit=RU-1-5,Port=A###Network=Test,Unit=RU-1-6,Port=A
But I want output to be printed like below (expected output):
-r_1=Network=Test,Unit=RU-1-1,Port=A
-d_1=Network=Test,Cell=31###Network=Test,Cell=32
-r_2=Network=Test,Unit=RU-1-2,Port=A
-d_2=Network=Test,Cell=31###Network=Test,Cell=32
-r_3=Network=Test,Unit=RU-1-5,Port=A
-d_3=Network=Test,Cell=33
-r_4=Network=Test,Unit=RU-1-6,Port=A
-d_4=Network=Test,Cell=33
The expected output is, the value of -r_<count> should be printed as singular (from %hash keys array value) and -d_<count> (from %hash keys) should printed.

The output is guided by the unique values of the arrays. Your output loop must therefore iterate over these.
(
'Network=Test,Unit=RU-1-1,Port=A',
'Network=Test,Unit=RU-1-2,Port=A',
'Network=Test,Unit=RU-1-5,Port=A',
'Network=Test,Unit=RU-1-6,Port=A',
)
However, for each of these, the output needs the associated keys. This means the output loop requires the following data:
(
'Network=Test,Unit=RU-1-1,Port=A' => [ 'Network=Test,Cell=31', 'Network=Test,Cell=32' ],
'Network=Test,Unit=RU-1-2,Port=A' => [ 'Network=Test,Cell=31', 'Network=Test,Cell=32' ],
'Network=Test,Unit=RU-1-5,Port=A' => [ 'Network=Test,Cell=33' ],
'Network=Test,Unit=RU-1-6,Port=A' => [ 'Network=Test,Cell=33' ],
)
Basically, your data structure is inside-out. But now that we know what we want, it's just a question of transforming the data structure into what we need.
my %foos_by_bar;
for my $foo (keys %hash) { # %hash_b
my $bars = $hash{$foo}; # %hash_a
for my $bar (#$bars) {
push #{ $foos_by_bar{$bar} }, $foo;
}
}
The output loop simply needs to iterate over the (possibly sorted) keys of %foos_by_bar, and #{ $foos_by_bar{$bar} } contains the data you need for -d.
Nothing's stopping you from iterating over the sorted keys of %foos_by_bar in the output loop to produce predictable output, but that won't necessarily give you the same order as in the question. If you need that specific order, you can use the following:
my #bars;
my %foos_by_bar;
for my $foo (sort keys %hash) { # %hash_b
my $bars = $hash{$foo}; # %hash_a
for my $bar (#$bars) {
push #bars, $bar if !$foos_by_bar{$bar};
push #{ $foos_by_bar{$bar} }, $foo;
}
}
In this case, the output loop would iterate over #bars.

Related

Search whether AoH value exists in same Hash

I have hash which contains some data.
I want my final %hash to be printed like this:
'UGroup=1' => [ 'C72', 'C73', 'C71' ]
Here is my script:
use Data::Dumper;
my %h = (
'C72' => [ 'S=2-1' ],
'C73' => [ 'S=3-1' ],
'C71' => [ 'S=91-1'],
'UGroup=1' => [ 'S=1-1',
'S=2-1',
'S=3-1',
'S=91-1'],
);
print Dumper(\%h);
foreach my $C (sort keys %h) {
next unless $C =~ /UGroup/;
for my $f (#{$h{$C}}){
print "\tf:$f\n";
#This is not correct, but wanted to do something like this.
push #{$hash{$C}}, $f if(exists $h{$f});
}
}
print Dumper(\%hash);
Here in example input hash I need to check if S=91-1 has any key? If yes then associate that key to value for %hash with its original key.
How can I do that?
You didn't name the things, so
S=91-1 shall be a snake,
C71 shall be a cow, and
UGroup=1 shall be a group.
Start by building this hash:
my %cows_by_snake = (
'S=91-1' => [ 'C71' ],
'S=2-1' => [ 'C72' ],
'S=3-1' => [ 'C73' ],
);
Just ignore the keys that of %h that are groups when you do so.
Once you built a hash, it's simply a question of doing the following:
Create an empty result hash.
For each group,
Create an empty collection of cows.
For each snake associated the the group,
Add the cows associated with the snake to the collection.
Eliminate the duplicates in the collection of cows.
Add the group and the associated cows to the result hash.
my #groups;
my #cows;
for my $cow_or_group (keys(%h)) {
if ($cow_or_group =~ /^UGroup=/) {
push #groups, $cow_or_group;
} else {
push #cows, $cow_or_group;
}
}
my %cows_by_snake;
for my $cow (#cows) {
for my $snake (#{ $h{$cow} }) {
push #{ $cows_by_snake{$snake} }, $cow;
}
}
my %results;
for my $group (#groups) {
my %group_cows;
for my $snake (#{ $h{$group} }) {
for my $cow (#{ $cows_by_snake{$snake} }) {
++$group_cows{$cow};
}
}
$results{$group} = [ sort keys %group_cows ];
}

Hash adding value without assignment [duplicate]

This question already has an answer here:
Hash in Perl adds key if it does not exist
(1 answer)
Closed 4 years ago.
I've got a script which contains 2 hashes and while printing out the contents I'm finding that the script is assigning a value to the 2nd hash without me doing it. I read through the 1st hash, then the 2nd, and then read through the entire 2nd hash after. It should only contain 1 entry in hash2, but it now contains 2 entries. How is the value James in hash2 getting assigned here?
my %hash1 = ();
my %hash2 = ();
$hash1{"James"}{"1 Main Street"}++;
$hash1{"John"}{"2 Elm Street"}++;
$hash2{"John"}{"3 Oak Street"}++;
foreach my $name (keys %hash1) {
print "Hash1 Name $name\n";
foreach my $address (keys %{$hash1{$name}}) {
print "Hash1 Address $address\n";
foreach my $address (keys %{$hash2{$name}}) {
print "Hash2 Address $address\n";
}
}
}
print "\n";
foreach my $name (keys %hash2) {
print "Hash2 Name $name\n";
foreach my $address (keys %{$hash2{$name}}) {
print "Hash2 Address $address\n";
}
}
output looks like this:
Hash1 Name James
Hash1 Address 1 Main Street
Hash1 Name John
Hash1 Address 2 Elm Street
Hash2 Address 3 Oak Street
Hash2 Name James
Hash2 Name John
Hash2 Address 3 Oak Street
The second value is being created when you are trying to read non-existan key from hash 2.
my %hash1 = ();
my %hash2 = ();
$hash1{"James"}{"1 Main Street"}++;
$hash1{"John"}{"2 Elm Street"}++;
$hash2{"John"}{"3 Oak Street"}++;
foreach my $name (keys %hash1) {
print "Hash1 Name $name\n";
foreach my $address (keys %{$hash1{$name}}) {
print "Hash1 Address $address\n";
next unless exists $hash2{$name}; # check if the key exists in second hash before trying to use the key in $hash2
foreach my $address (keys %{$hash2{$name}}) { #second value gets created here
print "Hash2 Address $address\n";
}
}
}
print "\n";
foreach my $name (keys %hash2) {
print "Hash2 Name $name\n";
foreach my $address (keys %{$hash2{$name}}) {
print "Hash2 Address $address\n";
}
}
When you used an undefined value as if it's a reference, Perl makes the reference sort that you wanted then tries to perform the operation. This is called "auto-vivification".
Here's a small demonstration. A variable starts out as undefined. You then treat it as an array reference (the dereference to get the 0th element):
use Data::Dumper;
my $empty;
print Dumper( $empty );
my $value = $empty->[0];
print Dumper( $empty );
Perl converts $empty to an array reference then tries to get the 0th element from that. You are left with an array reference where you formerly had undef:
$VAR1 = undef;
$VAR1 = [];
This is intended behavior.
Take it one step further. Put that undef inside an array and treat that element as if it's an array reference:
use Data::Dumper;
my #array = ( 1, undef, 'red' );
print Dumper( \#array );
my $value = $array[1]->[0];
print Dumper( \#array );
Now the second element is an empty array reference:
$VAR1 = [
1,
undef,
'red'
];
$VAR1 = [
1,
[],
'red'
];
Take it another step further. Don't store the undef value. Instead, access an array position past the last item in the array:
use Data::Dumper;
my #array = ( 1, 'red' );
print Dumper( \#array );
my $value = $array[2]->[0];
print Dumper( \#array );
Now you get an array reference element in your array. It's one element longer now:
$VAR1 = [
1,
'red'
];
$VAR1 = [
1,
'red',
[]
];
Had you gone further out (say, element 5), the interstitial elements up to the element you wanted would have been "filled in" with undef:
use Data::Dumper;
my #array = ( 1, 'red' );
print Dumper( \#array );
my $value = $array[5]->[0];
print Dumper( \#array );
$VAR1 = [
1,
'red'
];
$VAR1 = [
1,
'red',
undef,
undef,
undef,
[]
];
A hash works the same way, and that's what you are seeing. When you want to check if there is a second-level key under James, Perl needs to create the James key and give it an empty hash ref value to it can check that. That second-level key is not there, but the first-level key of 'James' sticks around:
use Data::Dumper;
my %hash = (
John => { Jay => '137' },
);
print Dumper( \%hash );
if( exists $hash{James}{Jay} ) {
print $hash{James}{Jay};
}
print Dumper( \%hash );
Now you see an extra key:
$VAR1 = {
'John' => {
'Jay' => '137'
}
};
$VAR1 = {
'James' => {},
'John' => {
'Jay' => '137'
}
};
In this case, you don't like this feature, but you can turn it off with the no autovivification pragma. It's a CPAN module that you need to install first:
no autovivification;
use Data::Dumper;
my %hash = (
John => { Jay => '137' },
);
print Dumper( \%hash );
if( exists $hash{James}{Jay} ) {
print $hash{James}{Jay};
}
print Dumper( \%hash );
You don't get the extra key:
$VAR1 = {
'John' => {
'Jay' => '137'
}
};
$VAR1 = {
'John' => {
'Jay' => '137'
}
};
You might also like to read How can I check if a key exists in a deep Perl hash?. I show a method that allows you to inspect a nested hash without creating intermediate levels.

Printing content of ARRAY inside hash

how do I print a content of an array inside the hash? I am using Dumper so you can see the data that I am parsing.
print Dumper \%loginusers;
for my $key ( keys %loginusers ) {
my $value = $loginusers{$key};
print "$key => $value\n";
}
printf "%s %-32s %-18s\n","User","Hostname","Since";
The output is
$VAR1 = {
'server1.localdomain.com:8080' => [
', 'user=user1
' 'since=2017-03-10 13:53:27
]
};
server1.localdomain.com:8080 => ARRAY(0x1584748)
User Hostname Since
As you can see there is an ARRAY(0x1584748) and I don't know how to get that value inside from the hash.
What I would like to see is something like:
User Hostname Since
user1 server1.localdomain.com:8080 2017-03-10 13:53:27
Thank you very much for someone that can help.
Update:
So after trying this to see the data how it looks:
foreach my $key (keys %loginusers)
{
print "For $key:\n";
print "\t|$_|\n" for #{$loginusers{$key}};
}
The output looks like this:
For server1.localdomain.com:8080:
| |user=user1
| |since=2017-03-10 13:53:27
Update:
tried the add these on the code:
foreach my $key (keys %loginusers)
{
my #fields =
map { s/^\s*//; s/\s*\Z//; s/\s*\n\s*/ /g; $_ }
grep { /\S/ }
#{$loginusers{$key}};
print "For $key:\n";
print "$_\n" for #fields;
}
And using the both sample code:
printf "%-8s %-32s %s\n", qw(User Hostname Since);
foreach my $key (keys %loginusers)
{
my %field = map { /\s*(.*?)=\s*(.*)/ } #{$loginusers{$key}};
my ($host, $rgsender, $port) = split /:/, $key;
printf "%-8s %-32s %s\n", $field{user}, $host, $field{since};
}
my $newusers;
for my $host ( keys %loginusers ) {
local $/ = "\r\n"; #localised "input record separator" for the "chomp"
%{$newusers->{$host}} = map { chomp; split /=/,$_,2 } #{$loginusers{$host}};
}
undef %loginusers; #not needed anymore
#print "NEW STRUCTURE: ", Dumper $newusers;
printline( qw(User Hostname Since) );
printline($newusers->{$_}{user}, $_, $newusers->{$_}{since}) for (keys %$newusers);
sub printline { printf "%-8s %-32s %-18s\n", #_; }
and here is the results:
User Hostname Since
user1 server1.localdomain.com:8080 2017-03-10 13:53:27
User Hostname Since
user1 server1.localdomain.com:8080 2017-03-10 13:53:27
A hash value is a scalar, and it can take a reference. This is how we build complex data structures. Yours apparently have arrayrefs, so you need to dereference them. Something like
foreach my $key (keys %hash) {
say "$key => #{$hash{key}}";
}
See the tutorial perlretut and the cookbook on data structures perldsc.
The strange output from Dumper indicates that there may be leading/trailing spaces around values (or worse), which need be cleaned out. Until this is clarified I'll assume data like
'server1.localdomain.com:8080' => ['user=user1', 'since=2017-03-10 13:53:27']
In order to get the desired output you need to split each element
printf "%-8s %-32s %s\n", qw(User Hostname Since);
foreach my $key (keys %hash)
{
my ($user, $since) = map { /=\s*(.*)/ } #{$hash{$key}};
printf "%-8s %-32s %s\n", $user, $key, $since;
}
For each value, we dereference it and pass that through map. The code in maps block, that is applied to each element, pulls what is after =. Given the data, the first one is the user and the second one is timestamp. Since this is an array (and not a hash) I assume that the order is fixed. If not, get strings from both sides of = and analyze them to see which one goes where. Or better use a hash
my %field = map { /\s*(.*?)=\s*(.*)/ } #{$hash{$key}};
where .*? is the non-greedy version of .*, capturing until the first =. Then print as
printf "%-8s %-32s %s\n", $field{user}, $key, $field{since};
and you don't rely on the order in the arrayref. See the answer by jm666 for a nice and consistent approach building this from the beginning.
With the hash shown above this prints
User Hostname Since
user1 server1.localdomain.com:8080 2017-03-10 13:53:27
I've used 8 and 32 widths based on shown data. For more precision, there are modules for tabular output. If you do it by hand you need to pre-process and find the longest word for each column among keys and/or values, and then use those lengths in the second pass with printf.
It appears that Dumper is getting confused by strange data. To see what we have do
foreach my $key (keys %loginusers)
{
print "For $key:\n";
print "\t|$_|\n" for #{$loginusers{$key}};
}
To clean up the data you can try
foreach my $key (keys %loginusers)
{
my #fields =
map { s/^\s*//; s/\s*$//; s/\s*\R\s*/ /g; $_ }
grep { /\S/ }
#{$loginusers{$key}};
print "For $key:\n";
print "$_\n" for #fields;
}
The grep takes an input list and filters out those elements for which the code inside its block evaluates false. Here we require at least one non-space character. Then its output goes into map, which removes all leading and trailing whitespace, and replaces all newlines with spaces.
The your data-structure isn't very nice. I would convert it to some better, using:
#convert to better structure
my $newusers;
for my $host ( keys %loginusers ) {
%{$newusers->{$host}} = map { chomp; split /=/,$_,2 } #{$loginusers{$host}};
}
undef %loginusers; #the old not needed anymore
print "NEW STRUCTURE: ", Dumper $newusers;
The dump now looks like:
NEW STRUCTURE: $VAR1 = {
'server1.localdomain.com:8080' => {
'user' => 'user1',
'since' => '2017-03-10 13:53:27'
}
};
after the above the printing is simple:
printline( qw(User Hostname Since) );
printline($newusers->{$_}{user}, $_, $newusers->{$_}{since}) for (keys %$newusers);
sub printline { printf "%-8s %-32s %-18s\n", #_; }
For the explanation read #zdim's excellent answer (and accept his answer :))
full code
use 5.014;
use warnings;
use Data::Dumper;
my %loginusers = (
'server1.localdomain.com:8080' => [
"user=user1\r\n", # you probably have the \r too
"since=2017-03-10 13:53:27\r\n",
]
);
say "OLD STRUCTURE: ", Dumper \%loginusers;
#convert to better structure
my $newusers;
for my $host ( keys %loginusers ) {
%{$newusers->{$host}} = map { s/[\r\n]//g; split /=/, $_, 2 } #{$loginusers{$host}}; #removes all \r and \n
}
undef %loginusers; #not needed anymore
say "NEW STRUCTURE: ", Dumper $newusers;
printline( qw(User Hostname Since) );
printline($newusers->{$_}{user}, $_, $newusers->{$_}{since}) for (keys %$newusers);
sub printline { printf "%-8s %-32s %-18s\n", #_; }
result:
OLD STRUCTURE: $VAR1 = {
'server1.localdomain.com:8080' => [
'user=user1
',
'since=2017-03-10 13:53:27
'
]
};
NEW STRUCTURE: $VAR1 = {
'server1.localdomain.com:8080' => {
'user' => 'user1',
'since' => '2017-03-10 13:53:27'
}
};
User Hostname Since
user1 server1.localdomain.com:8080 2017-03-10 13:53:27
EDIT
You probably have the \r in your data too. See the updated code.

Hash doesn't print in Perl

I have a hash:
while( my( $key, $value ) = each %sorted_features ){
print "$key: $value\n";
}
but I cannot obtain the correct value for $value. It gives me:
intron: ARRAY(0x3430440)
source: ARRAY(0x34303b0)
exon: ARRAY(0x34303f8)
sig_peptide: ARRAY(0x33f0a48)
mat_peptide: ARRAY(0x3430008)
Why is it?
Your values are array references. You need to do something like
while( my( $key, $value ) = each %sorted_features ) {
print "$key: #$value\n";
}
In other words, dereference the reference. If you are unsure what your data looks like, a good idea is to use the Data::Dumper module:
use Data::Dumper;
print Dumper \%sorted_features;
You will see something like:
$VAR1 = {
'intron' => [
1,
2,
3
]
};
Where { denotes the start of a hash reference, and [ an array reference.
You can use also Data::Dumper::Pertidy which runs the output of Data::Dump through Perltidy.
#!/usr/bin/perl -w
use strict;
use Data::Dumper::Perltidy;
my $data = [{title=>'This is a test header'},{data_range=>
[0,0,3, 9]},{format => 'bold' }];
print Dumper $data;
Prints:
$VAR1 = [
{ 'title' => 'This is a test header' },
{ 'data_range' => [ 0, 0, 3, 9 ] },
{ 'format' => 'bold' }
];
Your hash values are array references. You need to write additional code to display the contents of these arrays, but if you are just debugging then it is probably simpler to use Data::Dumper like this
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper \%sorted_features;
And, by the way, the name %sorted_features of your hash worries me. hashes are inherently unsorted, and the order that each retrieves the elements is essentially random.

how to declare array reference in hash refrence

my $memType = [];
my $portOp = [];
my $fo = "aster.out.DRAMA.READ.gz";
if($fo =~/aster.out\.(.*)\.(.*)\.gz/){
push (#{$memType},$1);
push (#{$portOp},$2);
}
print Dumper #{$memType};
foreach my $mem (keys %{$portCapability->{#{$memType}}}){
//How to use the array ref memType inside a hash//
print "entered here\n";
//cannot post the rest of the code for obvious reasons//
}
I am not able to enter the foreach loop . Can anyone help me fix it?
Sorry this is not the complete code . Please help me.
%{$portCapability->{#{$memType}}}
This doesn't do what you may think it means.
You treat $portCapability->{#{$memType}} as a hash reference.
The #{$memType} is evaluated in scalar context, thus giving the size of the array.
I aren't quite sure what you want, but would
%{ $portCapability->{ $memType->[0] } }
work?
If, however, you want to slice the elements in $portCapability, you would need somethink like
#{ $portCapability }{ #$memType }
This evaluates to a list of hashrefs. You can then loop over the hashrefs, and loop over the keys in an inner loop:
for my $hash (#{ $portCapability }{ #$memType }) {
for my $key (keys %$hash) {
...;
}
}
If you want a flat list of all keys of the inner hashes, but don't need the hashes themselves, you could shorten above code to
for my $key (map {keys %$_} #{ $portCapability }{ #$memType }) {
...;
}
I think what you want is this:
my $foo = {
asdf => {
a => 1, b => 2,
},
foo => {
c => 3, d => 4
},
bar => {
e => 5, f => 6
}
};
my #keys = qw( asdf foo );
foreach my $k ( map { keys %{ $foo->{$_} } } #keys ) {
say $k;
}
But you do not know which of these $k belongs to which key of $foo now.
There's no direct way to get the keys of multiple things at the same time. It doesn't matter if these things are hashrefs that are stored within the same hashref under different keys, or if they are seperate variables. What you have to do is build that list yourself, by looking at each of the things in turn. That's simply done with above map statement.
First, look at all the keys in $foo. Then for each of these, return the keys inside that element.
my $memType = [];
my $portOp = [];
my $fo = “aster.out.DRAMA.READ.gz”;
if ($fo =~ /aster.out\.(\w+)\.(\w+)\.gz/ ) { #This regular expression is safer
push (#$memType, $1);
push (#$portOp, $2);
}
print Dumper “#$memType”; #should print “DRAMA”
#Now if you have earlier in your program the hash %portCapability, your code can be:
foreach $mem (#$memType) {
print $portCapability{$mem};
}
#or if you have the hash $portCapability = {…}, your code can be:
foreach $mem (#$memType) {
print $portCapability->{$mem};
}
#Hope it helps