Join two hashes to one [duplicate] - hash

This question already has answers here:
How can I combine hashes in Perl?
(4 answers)
Closed 8 years ago.
I process several XML files by calling recursive method process_node:
for my $file (#ARGV) {
my $doc = XML::LibXML->load_xml(location => $file);
my $report;
my $items = [];
process_node($doc->getDocumentElement, $report, $items);
print Dumper($items);
}
They all have similar structure:
A big parent node named CalRunnerReport with attributes BSN, ProjectName, StationName
And many children nodes named TestItem
I'm trying to prepare a array of hashes referenced by the variable $items- to be used as data source for DataTables.net (an HTML table component):
sub process_node($$$) {
my ($node, $report, $items) = #_;
return unless $node->nodeType == XML_ELEMENT_NODE;
if ($node->nodeName eq 'CalRunnerReport') {
my $attr = get_attributes($node);
$report = {
BSN => $attr->{BSN},
ProjectName => $attr->{ProjectName},
StationName => $attr->{StationName},
}
} elsif ($node->nodeName eq 'TestItem') {
my $attr = get_attributes($node);
push #$items, [ # XXX fails to create a hash
%$report,
%$attr,
];
}
for my $child ($node->getChildnodes) {
process_node($child, $report, $items);
}
}
sub get_attributes($) {
my $node = shift;
my $attr = {};
for my $a ($node->attributes) {
my $key = $a->name;
my $val = $a->value;
$attr->{$key} = $val;
}
return $attr;
}
However in the Data:Dumper output I see that the push statement above hasn't created a hash, but a list:
[
'BSN',
'1147386447',
'ProjectName',
'R089',
'StationName',
'B',
'ExecutionTime',
'00:00:00',
'Result',
'PASS',
'EndTime',
'03/09/2013 21:00:03',
'StartTime',
'03/09/2013 21:00:03',
'Name',
'RecordOperationParameter'
],
[
'BSN',
'1147386447',
'ProjectName',
'R089',
'StationName',
'B',
'ExecutionTime',
'00:00:00',
'Result',
'PASS',
'EndTime',
'03/09/2013 21:00:03',
'StartTime',
'03/09/2013 21:00:03',
'Name',
'ClearLimitTestPendingList'
]
Is there please a nice way to combine the 2 hashes %$report and %$attr to a single hash - without looping through all their keys?

You want something like this:
push #$items, { %$report, %$attr };
Order matters - in this case, any shared keys will have the value from $attr rather than the value from $report in the result hash.

Related

Perl get value of a key

I have been stuck in trying to create an array of keys (example_com,example_ca ..etc) if they are set to 1, I have tried using for loop and foreach loop, but keep getting ARRAY# error.
$VAR1 = [
{
'example_com' => '1',
'example_ca' => '1'
}
];
Thanks
This will be because you have an array containing a hash. The array is one element long.
So you 'get' to the hash, by dereferencing element zero.
Thus:
my $hash_ref = $VAR1->[0];
print join "\n", keys %{$hash_ref},"\n";
foreach my $key ( keys %{$VAR1->[0]} ) {
print "$key => $VAR1->[0]{$key}\n";
}
Exactly for you source data:
my #array_of_keys = ();
for( keys %{ $VAR1->[0] } ) {
push #array_of_keys, $_ if $VAR1->[0]{ $_ } eq '1';
}
print "Keys with 1: #array_of_keys";
An expanded example of how to get an array of keys if you have multiple hashes in your container array:
my $VAR1 = [
{
'example_com' => '1',
'example_ca' => '1',
'not_set' => '0'
},
{
'EXAMPLE_com' => '1',
'EXAMPLE_ca' => '1',
'NOT_SET' => '0',
}
];
my #arrayOfHashes = #{$VAR1};
foreach my $array (#arrayOfHashes)
{
my #onlyOnes;
my #arrayOfKeys = sort keys %{$array};
foreach my $key (#arrayOfKeys)
{
next if ($array->{$key} ne 1);
push #onlyOnes, $key;
}
print "\nKey names:\n";
foreach my $key (#onlyOnes)
{
print "$key\n";
}
}
output:
Key names:
example_ca
example_com
Key names:
EXAMPLE_ca
EXAMPLE_com

Get value from hash of hashes

I would like to get value from hash of hashes but i do not. My code is :
sub test {
my $filename = $_[0];
open INFILE, ${filename} or die $!;
my %hashCount;
my #firstline = split('\t',<INFILE>);
shift(#firstline);
while (my $line = <INFILE>)
{
my %temp;
chomp($line);
my #line = split('\t', $line);
foreach my $cpt (1..$#line) {
$temp{$firstline[$cpt-1]}=$line[$cpt];
}
$hashCount{$line[0]}={%temp};
}
return %hashCount;
}
sub get_hash_of_hash {
my $h = shift;
foreach my $key (keys %$h) {
if( ref $h->{$key}) {
get_hash_of_hash( $h->{$key} );
}
else {
say $h->{$key};
}
}
}
And when i display my hash :
$VAR10679 = 'M00967_43_1106_2493_14707';
$VAR10680 = {
'A' => '1',
'B' => '0',
'C' => '1',
'D' => '0',
'E' => '0'
};
My first function return my hash of hashes and i get my specific value with the second function.
So I want to get value like that :
my %hashTest = test("FILE.txt");
get_hash_of_hash(%hashTest,"M00967_43_1106_2493_14707","A")
//return value '1'
You can either access nested elements like
$hash{keyA}{keyB}
or we can write a function that walks the data structure, like
sub walk {
my ($hashref, #keys) = #_;
my $pointer = $hashref;
for my $key (#keys) {
if (exists $pointer->{$key}) {
$pointer = $pointer->{$key};
} else {
die "No value at ", join "->", #keys;
}
}
return $pointer;
}
which can be used like
my %hash = (
'M00967_43_1106_2493_14707' => {
'A' => '1',
'B' => '0',
'C' => '1',
'D' => '0',
'E' => '0'
},
);
say walk(\%hash, 'M00967_43_1106_2493_14707', 'A');
Note: When using Data::Dumper, pass references to the Dump function:
print Dump \%hash; # not print Dump %hash
This is neccessary to show the correct data structure.
Your hash holds references to hashes.
You can access them like this:
$hashTest{'M00967_43_1106_2493_14707'}{'A'};
See perlref for more info
Use this subroutine..
sub get_hash_of_hash {
my $h = shift;
foreach my $key (keys %$h) {
if( ref $h->{$key}) {
get_hash_of_hash( $h->{$key} );
}
else {
print $h->{$key};
}
}
}

not able to access hash of hash of array values

I have written the following code in Perl. The code is reading a pdb file and getting some values. Ignore the top part of the code,where everything is working perfect.
Problem is in the sub-routine part, where I try to store arrays in the hash3 with model as key another key position
the array values can be accessed inside the if condition using this :
$hash3{$model}{$coordinates}[1].
but when I go out of all foreach loop and try to access the elements I only get one value.
Please look at the end foreach loop and tell me is it the wrong way to access the hash values.
The pdb file I am using can be downloaded from this link http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1NZS
#!/usr/bin/perl
open(IN,$ARGV[0]);
my #phosphosites;
my $model=1;
my %hash3;
while(<IN>)
{
#findmod(#line);
#finddist;
#findfreq;
if((/^MODRES/) && (/PHOSPHO/))
{
#line=split;
push(#phosphosites, $line[2]);
#print "$line[4]";
}
foreach $elements (#phosphosites){
if(/^HETATM\s+\d+\s+CA\s+$i/)
{
#line1=split;
#print "$line1[5]";
#print "$line1[6] $line1[7] $line1[8]\n";
push(#phosphositesnum, $line1[5]);
}
}
$pos=$line1[5];
#findspatial(\#line,\#line1);
}
my #ori_data=removeDuplicates(#phosphositesnum);
sub removeDuplicates {
my %seen = ();
my #vals = ();
foreach my $i (#_) {
unless ($seen{$i}) {
push #vals, $i;
$seen{$i} = 1;
}
}
return #vals;
}
$a=(#phosphosites);
print "$a\n";
print "#phosphosites\n";
print "#ori_data\n";
close(IN);
open(IN1,$ARGV[0]);
my (#data)=<IN1>;
spatial(\#ori_data);
sub spatial {
my #spatial_array1=#{$_[0]};
foreach $coordinates(#spatial_array1)
{
$model=1;
{foreach $data1(#data){
if($data1=~ m/^HETATM\s+\d+\s+CA\s+[A-Z]*\s+[A-Z]*\s+$coordinates/)
{
#cordivals=split(/\s+/,$data1);
push #{ $sphash{$model} },[$cordivals[6], $cordivals[7], $cordivals[8]];
$hash3{$model}{$coordinates}= \#cordivals;
#print "$model $coordinates $hash3{$model}{$coordinates}[6] $hash3{$model}{$coordinates}[7] $hash3{$model}{$coordinates}[8]\n";
#print "$model $sphash{$model}[$i][0] $sphash{$model}[$i][1] $sphash{$model}[$i][2]\n";
}
elsif($data1=~ m/^ENDMDL/)
{
$model++;
}
#print "$model $coordinates $hash3{$model}{$coordinates}[6] $hash3{$model}{$coordinates}[7] $hash3{$model}{$coordinates}[8]\n";
}
}
}
#foreach $z1 (sort keys %hash3)
# {
# foreach $z2(#spatial_array1){
# print "$z1 $z2";
# print "$hash3{$z1}{$z2}[6]\n";
# print "$z2\n";
# }
# }
}
After using the Data::Dumper option it is giving me this kind of output
$VAR1 = {
'11' => {
'334' => [
'HETATM',
'115',
'CA',
'SEP',
'A',
'343',
'-0.201',
'-2.884',
'1.022',
'1.00',
'99.99',
'C'
],
'342' => $VAR1->{'11'}{'334'},
'338' => $VAR1->{'11'}{'334'},
'335' => $VAR1->{'11'}{'334'},
'340' => $VAR1->{'11'}{'334'},
'343' => $VAR1->{'11'}{'334'},
'336' => $VAR1->{'11'}{'334'}
},
'7' => {
'334' => $VAR1->{'11'}{'334'},
'342' => $VAR1->{'11'}{'334'},
'338' => $VAR1->{'11'}{'334'},
'335' => $VAR1->{'11'}{'334'},
'340' => $VAR1->{'11'}{'334'},
'343' => $VAR1->{'11'}{'334'},
'336' => $VAR1->{'11'}{'334'}
},
'2' => {
'334' => $VAR1->{'11'}{'334'},
'342' => $VAR1->{'11'}{'334'},
...
Change:
#cordivals=split(/\s+/,$data1);
to:
my #cordivals=split(/\s+/,$data1);
What seems to be happening is that all the hash elements contain references to the same array variable, because you're not making the variable local to that iteration.
In general, you should use my with all variables.

Perl Working On Two Hash References

I would like to compare the values of two hash references.
The data dumper of my first hash is this:
$VAR1 = {
'42-MG-BA' => [
{
'chromosome' => '19',
'position' => '35770059',
'genotype' => 'TC'
},
{
'chromosome' => '2',
'position' => '68019584',
'genotype' => 'G'
},
{
'chromosome' => '16',
'position' => '9561557',
'genotype' => 'G'
},
And the second hash is similar to this but with more hashes in the array. I would like to compare the genotype of my first and second hash if the position and the choromosome matches.
map {print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n"}sort keys %$cave_snp_list;
map {print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n"}sort keys %$geno_seq_list;
I could do that for the first array of the hashes.
Could you help me in how to work for all the arrays?
This is my actual code in full
#!/software/bin/perl
use strict;
use warnings;
use Getopt::Long;
use Benchmark;
use Config::Config qw(Sequenom.ini);
useDatabase::Conn;
use Data::Dumper;
GetOptions("sam=s" => \my $sample);
my $geno_seq_list = getseqgenotypes($sample);
my $cave_snp_list = getcavemansnpfile($sample);
#print Dumper($geno_seq_list);
print scalar %$geno_seq_list, "\n";
foreach my $sam (keys %{$geno_seq_list}) {
my $seq_used = $geno_seq_list->{$sam};
my $cave_used = $cave_snp_list->{$sam};
print scalar(#$geno_seq_list->{$_}) if sort keys %$geno_seq_list, "\n";
print scalar(#$cave_used), "\n";
#foreach my $seq2com (# {$seq_used } ){
# foreach my $cave2com( # {$cave_used} ){
# print $seq2com->{chromosome},":" ,$cave2com->{chromosome},"\n";
# }
#}
map { print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n" } sort keys %$cave_snp_list;
map { print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n" } sort keys %$geno_seq_list;
}
sub getseqgenotypes {
my $snpconn;
my $gen_list = {};
$snpconn = Database::Conn->new('live');
$snpconn->addConnection(DBI->connect('dbi:Oracle:pssd.world', 'sn', 'ss', { RaiseError => 1, AutoCommit => 0 }),
'pssd');
#my $conn2 =Database::Conn->new('live');
#$conn2->addConnection(DBI->connect('dbi:Oracle:COSI.world','nst_owner','nst_owner', {RaiseError =>1 , AutoCommit=>0}),'nst');
my $id_ind = $snpconn->execute('snp::Sequenom::getIdIndforExomeSample', $sample);
my $genotype = $snpconn->executeArrRef('snp::Sequenom::getGenotypeCallsPosition', $id_ind);
foreach my $geno (#{$genotype}) {
push #{ $gen_list->{ $geno->[1] } }, {
chromosome => $geno->[2],
position => $geno->[3],
genotype => $geno->[4],
};
}
return ($gen_list);
} #end of sub getseqgenotypes
sub getcavemansnpfile {
my $nstconn;
my $caveman_list = {};
$nstconn = Database::Conn->new('live');
$nstconn->addConnection(
DBI->connect('dbi:Oracle:CANP.world', 'nst_owner', 'NST_OWNER', { RaiseError => 1, AutoCommit => 0 }), 'nst');
my $id_sample = $nstconn->execute('nst::Caveman::getSampleid', $sample);
#print "IDSample: $id_sample\n";
my $file_location = $nstconn->execute('nst::Caveman::getCaveManSNPSFile', $id_sample);
open(SNPFILE, "<$file_location") || die "Error: Cannot open the file $file_location:$!\n";
while (<SNPFILE>) {
chomp;
next if /^>/;
my #data = split;
my ($nor_geno, $tumor_geno) = split /\//, $data[5];
# array of hash
push #{ $caveman_list->{$sample} }, {
chromosome => $data[0],
position => $data[1],
genotype => $nor_geno,
};
} #end of while loop
close(SNPFILE);
return ($caveman_list);
}
The problem that I see is that you're constructing a tree for generic storage of data, when what you want is a graph, specific to the task. While you are constructing the record, you could also be constructing the part that groups data together. Below is just one example.
my %genotype_for;
my $record
= { chromosome => $data[0]
, position => $data[1]
, genotype => $nor_geno
};
push #{ $gen_list->{ $geno->[1] } }, $record;
# $genotype_for{ position }{ chromosome }{ name of array } = genotype code
$genotype_for{ $data[1] }{ $data[0] }{ $sample } = $nor_geno;
...
return ( $caveman_list, \%genotype_for );
In the main line, you receive them like so:
my ( $cave_snp_list, $geno_lookup ) = getcavemansnpfile( $sample );
This approach at least allows you to locate similar position and chromosome values. If you're going to do much with this, I might suggest an OO approach.
Update
Assuming that you wouldn't have to store the label, we could change the lookup to
$genotype_for{ $data[1] }{ $data[0] } = $nor_geno;
And then the comparison could be written:
foreach my $pos ( keys %$small_lookup ) {
next unless _HASH( my $sh = $small_lookup->{ $pos } )
and _HASH( my $lh = $large_lookup->{ $pos } )
;
foreach my $chrom ( keys %$sh ) {
next unless my $sc = $sh->{ $chrom }
and my $lc = $lh->{ $chrom }
;
print "$sc:$sc";
}
}
However, if you had limited use for the larger list, you could construct the specific case
and pass that in as a filter when creating the longer list.
Thus, in whichever loop creates the longer list, you could just go
...
next unless $sample{ $position }{ $chromosome };
my $record
= { chromosome => $chromosome
, position => $position
, genotype => $genotype
};
...

Perl - Removing unwanted elements from an arrayref

I'm writing a script that parses the "pure-ftpwho -s" command to get a list of the current transfers. But when a user disconnects from the FTP and reconnects back and resumes a transfer, the file shows up twice. I want to remove the ghosted one with Perl. After parsing, here is what the arrayref looks like (dumped with Data::Dumper)
$VAR1 = [
{
'status' => 'DL',
'percent' => '20',
'speed' => '10',
'file' => 'somefile.txt',
'user' => 'user1',
'size' => '14648'
},
{
'status' => 'DL',
'percent' => '63',
'speed' => '11',
'file' => 'somefile.txt',
'user' => 'user1',
'size' => '14648'
},
{
'status' => 'DL',
'percent' => '16',
'speed' => '60',
'file' => 'somefile.txt',
'user' => 'user2',
'size' => '14648'
}
];
Here user1 and user2 are downloading the same file, but user1 appears twice because the first one is a "ghost". What's the best way to check and remove elements that I don't need (in this case the first element of the arrayref). The condition to check is that - If the "file" key and "user" key is the same, then delete the hashref that contains the smaller value of "percent" key (if they're the same then delete all except one).
If order in the original arrayref doesn't matter, this should work:
my %users;
my #result;
for my $data (#$arrayref) {
push #{ $users{$data->{user}.$data->{file}} }, $data;
}
for my $value (values %users) {
my #data = sort { $a->{percent} <=> $b->{percent} } #$value;
push #result, $data[-1];
}
This can definitely be improved for efficiency.
The correct solution in this case would have been to use a hash when parsing the log file. Put all information into a hash, say %log, keyed by user and file:
$log{$user}->{$file} = {
'status' => 'DL',
'percent' => '20',
'speed' => '10',
'size' => '14648'
};
etc. Latter entries in the log file would overwrite earlier ones. Alternatively, you can choose to overwrite entries with lower percent completed with ones that have higher completion rates.
Using a hash would get rid of a lot of completely superfluous code working around the choice of the wrong data structure.
For what it's worth, here's my (slightly) alternative approach. Again, it doesn't preserve the original order:
my %most_progress;
for my $data ( sort { $b->{percent} <=> $a->{percent} } #$data ) {
next if exists $most_progress{$data->{user}.$data->{file}};
$most_progress{$data->{user}.$data->{file}} = $data;
}
my #clean_data = values %most_progress;
This will preserve order:
use strict;
use warnings;
my $data = [ ... ]; # As posted.
my %pct;
for my $i ( 0 .. $#{$data} ){
my $r = $data->[$i];
my $k = join '|', $r->{file}, $r->{user};
next if exists $pct{$k} and $pct{$k}[1] >= $r->{percent};
$pct{$k} = [$i, $r->{percent}];
}
#$data = #$data[sort map $_->[0], values %pct];
my %check;
for (my $i = 0; $i <= $#{$arrayref}; $i++) {
my $transfer = $arrayref->[$i];
# check the transfer for user and file
my $key = $transfer->{user} . $transfer->{file};
$check{$key} = { } if ( !exists $check{$key} );
if ( $transfer->{percent} <= $check{$key}->{percent} ) {
# undefine this less advanced transfer
$arrayref->[$i] = undef;
} else {
# remove the other transfer
$arrayref->[$check{$key}->{index}] = undef if exists $check{$key}->{index};
# set the new standard
$check{$key} = { index => $i, percent => $transfer->{percent} }
}
}
# remove all undefined transfers
$arrayref = [ grep { defined $_ } #$arrayref ];
Variation on the theme with Perl6::Gather
use Perl6::Gather;
my #cleaned = gather {
my %seen;
for (sort { $b->{percent} <=> $a->{percent} } #$data) {
take unless $seen{ $_->{user} . $_->{file} }++;
}
};