Converting HoA to HoH with counting - perl

Have this code:
use 5.020;
use warnings;
use Data::Dumper;
my %h = (
k1 => [qw(aa1 aa2 aa1)],
k2 => [qw(ab1 ab2 ab3)],
k3 => [qw(ac1 ac1 ac1)],
);
my %h2;
for my $k (keys %h) {
$h2{$k}{$_}++ for (#{$h{$k}});
}
say Dumper \%h2;
produces:
$VAR1 = {
'k1' => {
'aa2' => 1,
'aa1' => 2
},
'k3' => {
'ac1' => 3
},
'k2' => {
'ab1' => 1,
'ab3' => 1,
'ab2' => 1
}
};
Is possible to write the above code with "another way"? (e.g. simpler or more compact)?

Honestly, I don't like the number of times $h2{$k} is evaluated.
my %h2;
for my $k (keys %h) {
my $src = $h{$k};
my $dst = $h2{$k} = {};
++$dst->{$_} for #$src;
}
A subroutine can help make the intent more obvious. Maybe.
sub counts { my %c; ++$c{$_} for #_; \%c }
$h2{$_} = counts(#{ $h{$_} }) for keys %h;
That can be simplified if you do the change in-place.
sub counts { my %c; ++$c{$_} for #_; \%c }
$_ = counts(#$_) for values %h;

Related

Split a hash into many hashes

If I have a hash how could I "break" it/"split" it into multiple hashes containing equal number of keys?
Basically splice in arrays seems to be close to what I need (loop/slice) but that works only for arrays.
So what's the best way to do this?
Update:
Or a way to remove at most X number of key-values so as to simulate the splice of arrays
Update
{ foo => 1, bar => 2, bla =>3}
To be
{ foo => 1 }, { bar => 2 }, { bla => 3 } if X = 1
or { foo => 1, bar => 2 }, { bla => 3 } if X = 2
or { foo => 1, bar => 2, bla => 3 } if X = 3
This should do what you want. On 5.20+, you can probably use the new slice syntax to simplify the code.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
sub split_hash {
my ($x, $hash) = #_;
my #hashes;
while (%$hash) {
my #k = keys %$hash;
push #hashes, { map each %$hash, 1 .. $x };
delete #{ $hash }{ keys %{ $hashes[-1] } };
}
return #hashes
}
print Dumper([ split_hash($_, { foo => 1,
bar => 2,
bla => 3,
}
)]) for 1 .. 3;
Note that as written, the code deletes the original hash.
Similar to the solution provided by #choroba, but using splice, and doesn't modify the passed hash:
use Data::Dumper;
use strict;
use warnings;
sub split_hash {
my ( $x, $hash ) = #_;
my #keys = keys %$hash;
my #hashes;
while ( my #subset = splice( #keys, 0, $x ) ) {
push #hashes, { map { $_ => $hash->{$_} } #subset };
}
return \#hashes;
}
print Dumper( [
split_hash(
$_,
{
foo => 1,
bar => 2,
bla => 3,
} ) ] ) for 1 .. 3;

Perl: Sorting hash of hash by value descending order

data :
%HoH => (
abc => {
value => "12",
},
xyz => {
number => "100",
},
pqr => {
digit => "5",
}
)
How do I sort the hash of hash by value in descending order?
Output
100
12
5
You can't sort a hash, it won't hold the order. If you wanted to keep them sorted, you'll have to sort the keys based on the number and store the keys in an array.
#!/usr/bin/perl
use strict;
use warnings;
my %HoH = (
abc => { value => 12 },
xyz => { value => 100},
pqr => { value => 5},
def => { value => 15},
hij => { value => 30},
);
my #sorted_keys = map { $_->[0] }
sort { $b->[1] <=> $a->[1] } # use numeric comparison
map { my $temp;
if ( exists $HoH{$_}{'value'} ) {
$temp = $HoH{$_}{'value'};
} elsif ( exists $HoH{$_}{'number'} ) {
$temp = $HoH{$_}{'number'};
} elsif ( exists $HoH{$_}{'digit'} ) {
$temp = $HoH{$_}{'digit'};
} else {
$temp = 0;
}
{[$_, $temp]} }
(keys %HoH);
for my $key (#sorted_keys) {
my $temp;
if ( exists $HoH{$key}{'value'} ) {
$temp = $HoH{$key}{'value'};
} elsif ( exists $HoH{$key}{'number'} ) {
$temp = $HoH{$key}{'number'};
} elsif ( exists $HoH{$key}{'digit'} ) {
$temp = $HoH{$key}{'digit'};
} else {
$temp = 0;
}
print $key . ":" . $temp ."\n";
}
Output:
xyz:100
hij:30
def:15
abc:12
pqr:5
This technique to do the sorting is called Schwartzian Transform.
Given you're not actually using the keys for anything, you can flatten the data structure into a single array and then sort it:
use strict;
use warnings;
my %HoH = (
abc => {value => "12",},
xyz => {number => "100",},
pqr => {digit => "5",},
);
my #numbers = sort {$b <=> $a} map {values %$_} values %HoH;
print "$_\n" for #numbers;
Outputs:
100
12
5
However, if you want to use the additional key information, then you'll need fold your Hash of Hash into an array, and then you can sort however you like:
my #array;
while (my ($k, $ref) = each %HoH) {
while (my ($k2, $v) = each %$ref) {
push #array, [$k, $k2, $v];
}
}
#array = sort {$b->[2] <=> $a->[2]} #array;
use Data::Dump;
dd \#array;
Outputs:
[
["xyz", "number", 100],
["abc", "value", 12],
["pqr", "digit", 5],
]
I came up with this solution
#!/usr/bin/perl
use strict;
use warnings;
my %HoH = (
abc => {
value => "12",
},
xyz => {
number => "100",
},
pqr => {
digit => "5",
}
);
my %rever;
for my $TopKey(keys %HoH){
for my $value(values %{ $HoH{$TopKey} }){
push #{ $rever{$value} }, $TopKey;
}
}
my #nums = sort {$b <=> $a} (keys(%rever));
print $_, "\n" for #nums;
I reversed the values in case you still needed to use the key names.
This is how it looks after using Dumper.
$VAR1 = '100';
$VAR2 = [
'xyz'
];
$VAR3 = '12';
$VAR4 = [
'abc'
];
$VAR5 = '5';
$VAR6 = [
'pqr'
];

Perl adding Lines into a Multi-Dimensional Hash

Hello I want to split a Line and add the Values in to a multi dimensional Hash. This is how the Lines look like:
__DATA__
49839382;Test1;bgsae;npvxs
49839384;Test2;bgsae;npvxs
49839387;Test3;bgsae;npvxs
So what I am doing now is:
my %prefix = map { chomp; split ';' } <DATA>;
But now I can only access Test1 with:
print $prefix{"49839382"}
But how can I also add the bgsae to the Hash so I can access is with
$prefix{"49839382"}{"Test1"}
Thank you for your help.
What structure are you trying to build?
use Data::Dumper;
my %prefix = map { chomp (my #fields = split /;/); $fields[0] => { #fields[1 .. $#fields] } } <DATA>;
print Dumper \%prefix;
Output:
$VAR1 = {
'49839384' => {
'Test2' => 'bgsae',
'npvxs' => undef
},
'49839382' => {
'Test1' => 'bgsae',
'npvxs' => undef
},
'49839387' => {
'npvxs' => undef,
'Test3' => 'bgsae'
}
};
Or do you need a deeper hash?
my %prefix;
for (<DATA>) {
chomp;
my $ref = \%prefix;
for (split /;/) {
warn "[$_]";
$ref->{$_} = {};
$ref = $ref->{$_};
}
}
Returns:
$VAR1 = {
'49839384' => {
'Test2' => {
'bgsae' => {
'npvxs' => {}
}
}
},
'49839382' => {
'Test1' => {
'bgsae' => {
'npvxs' => {}
}
}
},
'49839387' => {
'Test3' => {
'bgsae' => {
'npvxs' => {}
}
}
}
};
I don't know what you need the data for, but at a guess you want something more like this.
It builds a hash of arrays, using the first field as the key for the data, and the remaining three in an array for the value. So you can access the test number as $data{'49839382'}[0] etc.
use strict;
use warnings;
my %data = map {
chomp;
my #fields = split /;/;
shift #fields => \#fields;
} <DATA>;
use Data::Dumper;
print Data::Dumper->Dump([\%data], ['*data']);
__DATA__
49839382;Test1;bgsae;npvxs
49839384;Test2;bgsae;npvxs
49839387;Test3;bgsae;npvxs
output
%data = (
'49839384' => [
'Test2',
'bgsae',
'npvxs'
],
'49839382' => [
'Test1',
'bgsae',
'npvxs'
],
'49839387' => [
'Test3',
'bgsae',
'npvxs'
]
);

Perl Working On Two Hash References

I would like to compare the values of two hash references.
The data dumper of my first hash is this:
$VAR1 = {
'42-MG-BA' => [
{
'chromosome' => '19',
'position' => '35770059',
'genotype' => 'TC'
},
{
'chromosome' => '2',
'position' => '68019584',
'genotype' => 'G'
},
{
'chromosome' => '16',
'position' => '9561557',
'genotype' => 'G'
},
And the second hash is similar to this but with more hashes in the array. I would like to compare the genotype of my first and second hash if the position and the choromosome matches.
map {print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n"}sort keys %$cave_snp_list;
map {print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n"}sort keys %$geno_seq_list;
I could do that for the first array of the hashes.
Could you help me in how to work for all the arrays?
This is my actual code in full
#!/software/bin/perl
use strict;
use warnings;
use Getopt::Long;
use Benchmark;
use Config::Config qw(Sequenom.ini);
useDatabase::Conn;
use Data::Dumper;
GetOptions("sam=s" => \my $sample);
my $geno_seq_list = getseqgenotypes($sample);
my $cave_snp_list = getcavemansnpfile($sample);
#print Dumper($geno_seq_list);
print scalar %$geno_seq_list, "\n";
foreach my $sam (keys %{$geno_seq_list}) {
my $seq_used = $geno_seq_list->{$sam};
my $cave_used = $cave_snp_list->{$sam};
print scalar(#$geno_seq_list->{$_}) if sort keys %$geno_seq_list, "\n";
print scalar(#$cave_used), "\n";
#foreach my $seq2com (# {$seq_used } ){
# foreach my $cave2com( # {$cave_used} ){
# print $seq2com->{chromosome},":" ,$cave2com->{chromosome},"\n";
# }
#}
map { print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n" } sort keys %$cave_snp_list;
map { print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n" } sort keys %$geno_seq_list;
}
sub getseqgenotypes {
my $snpconn;
my $gen_list = {};
$snpconn = Database::Conn->new('live');
$snpconn->addConnection(DBI->connect('dbi:Oracle:pssd.world', 'sn', 'ss', { RaiseError => 1, AutoCommit => 0 }),
'pssd');
#my $conn2 =Database::Conn->new('live');
#$conn2->addConnection(DBI->connect('dbi:Oracle:COSI.world','nst_owner','nst_owner', {RaiseError =>1 , AutoCommit=>0}),'nst');
my $id_ind = $snpconn->execute('snp::Sequenom::getIdIndforExomeSample', $sample);
my $genotype = $snpconn->executeArrRef('snp::Sequenom::getGenotypeCallsPosition', $id_ind);
foreach my $geno (#{$genotype}) {
push #{ $gen_list->{ $geno->[1] } }, {
chromosome => $geno->[2],
position => $geno->[3],
genotype => $geno->[4],
};
}
return ($gen_list);
} #end of sub getseqgenotypes
sub getcavemansnpfile {
my $nstconn;
my $caveman_list = {};
$nstconn = Database::Conn->new('live');
$nstconn->addConnection(
DBI->connect('dbi:Oracle:CANP.world', 'nst_owner', 'NST_OWNER', { RaiseError => 1, AutoCommit => 0 }), 'nst');
my $id_sample = $nstconn->execute('nst::Caveman::getSampleid', $sample);
#print "IDSample: $id_sample\n";
my $file_location = $nstconn->execute('nst::Caveman::getCaveManSNPSFile', $id_sample);
open(SNPFILE, "<$file_location") || die "Error: Cannot open the file $file_location:$!\n";
while (<SNPFILE>) {
chomp;
next if /^>/;
my #data = split;
my ($nor_geno, $tumor_geno) = split /\//, $data[5];
# array of hash
push #{ $caveman_list->{$sample} }, {
chromosome => $data[0],
position => $data[1],
genotype => $nor_geno,
};
} #end of while loop
close(SNPFILE);
return ($caveman_list);
}
The problem that I see is that you're constructing a tree for generic storage of data, when what you want is a graph, specific to the task. While you are constructing the record, you could also be constructing the part that groups data together. Below is just one example.
my %genotype_for;
my $record
= { chromosome => $data[0]
, position => $data[1]
, genotype => $nor_geno
};
push #{ $gen_list->{ $geno->[1] } }, $record;
# $genotype_for{ position }{ chromosome }{ name of array } = genotype code
$genotype_for{ $data[1] }{ $data[0] }{ $sample } = $nor_geno;
...
return ( $caveman_list, \%genotype_for );
In the main line, you receive them like so:
my ( $cave_snp_list, $geno_lookup ) = getcavemansnpfile( $sample );
This approach at least allows you to locate similar position and chromosome values. If you're going to do much with this, I might suggest an OO approach.
Update
Assuming that you wouldn't have to store the label, we could change the lookup to
$genotype_for{ $data[1] }{ $data[0] } = $nor_geno;
And then the comparison could be written:
foreach my $pos ( keys %$small_lookup ) {
next unless _HASH( my $sh = $small_lookup->{ $pos } )
and _HASH( my $lh = $large_lookup->{ $pos } )
;
foreach my $chrom ( keys %$sh ) {
next unless my $sc = $sh->{ $chrom }
and my $lc = $lh->{ $chrom }
;
print "$sc:$sc";
}
}
However, if you had limited use for the larger list, you could construct the specific case
and pass that in as a filter when creating the longer list.
Thus, in whichever loop creates the longer list, you could just go
...
next unless $sample{ $position }{ $chromosome };
my $record
= { chromosome => $chromosome
, position => $position
, genotype => $genotype
};
...

How do I sort hash of hashes by value using perl?

I have this code
use strict;
use warnings;
my %hash;
$hash{'1'}= {'Make' => 'Toyota','Color' => 'Red',};
$hash{'2'}= {'Make' => 'Ford','Color' => 'Blue',};
$hash{'3'}= {'Make' => 'Honda','Color' => 'Yellow',};
foreach my $key (keys %hash){
my $a = $hash{$key}{'Make'};
my $b = $hash{$key}{'Color'};
print "$a $b\n";
}
And this out put:
Toyota Red Honda Yellow Ford Blue
Need help sorting it by Make.
#!/usr/bin/perl
use strict;
use warnings;
my %hash = (
1 => { Make => 'Toyota', Color => 'Red', },
2 => { Make => 'Ford', Color => 'Blue', },
3 => { Make => 'Honda', Color => 'Yellow', },
);
# if you still need the keys...
foreach my $key ( #
sort { $hash{$a}->{Make} cmp $hash{$b}->{Make} } #
keys %hash
)
{
my $value = $hash{$key};
printf( "%s %s\n", $value->{Make}, $value->{Color} );
}
# if you don't...
foreach my $value ( #
sort { $a->{Make} cmp $b->{Make} } #
values %hash
)
{
printf( "%s %s\n", $value->{Make}, $value->{Color} );
}
print "$_->{Make} $_->{Color}" for
sort {
$b->{Make} cmp $a->{Make}
} values %hash;
plusplus is right... an array of hashrefs is likely a better choice of data structure. It's more scalable too; add more cars with push:
my #cars = (
{ make => 'Toyota', Color => 'Red' },
{ make => 'Ford' , Color => 'Blue' },
{ make => 'Honda' , Color => 'Yellow' },
);
foreach my $car ( sort { $a->{make} cmp $b->{make} } #cars ) {
foreach my $attribute ( keys %{ $car } ) {
print $attribute, ' : ', $car->{$attribute}, "\n";
}
}