How to get Data::Diver to produce arrays? - perl

The below script will output
$VAR1 = {
'tank' => {
'fs' => {
'fs2b' => undef,
'fs2a' => undef,
'fs2c' => undef
}
}
};
where I really wanted a hash of hash of array like this
$VAR1 = {
'tank' => {
'fs' => [
'fs2a',
'fs2b',
'fs2c'
]
}
};
Question
How would that be done with Data::Diver?
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Data::Diver 'DiveRef';
my #array = ("tank", "tank/fs", "tank/fs/fs2a", "tank/fs/fs2b", "tank/fs/fs2c");
my %hash = ();
foreach my $element (#array) {
DiveRef( \%hash, \( split /\//, $element ) );
}
print Dumper \%hash;
(Code provided by ysth in this answer to another question.)
Update
The array in the code is just an example. The real array have ~100 elements, so the solution can't be hard coded.

DiveVal(\%data, 'tank', 'fs', 0) = 'fs2a';
DiveVal(\%data, 'tank', 'fs', 1) = 'fs2b';
DiveVal(\%data, 'tank', 'fs', 2) = 'fs2c';
or
push #{ DiveVal(\%data, 'tank', 'fs') }, 'fs2a';
push #{ DiveVal(\%data, 'tank', 'fs') }, 'fs2b';
push #{ DiveVal(\%data, 'tank', 'fs') }, 'fs2c';
To get the desired data structure from "tank", "tank/fs", "tank/fs/fs2a", "tank/fs/fs2b", "tank/fs/fs2c", extra information is needed. For you example, you could have the understanding that the data structure is always going to be a HoHoA.
my #data = ("tank", "tank/fs", "tank/fs/fs2a", "tank/fs/fs2b", "tank/fs/fs2c");
my %data;
for (#data) {
my #parts = split qr{/};
if (#parts < 3) {
DiveVal(\%data, map \$_, #parts);
} else {
my $val = pop(#parts);
push #{ DiveVal(\%data, map \$_, #parts) }, $val;
}
}
But which such a limited structure, there's no reason to use Data::Diver. It would be far faster to avoid it.
my #data = ("tank", "tank/fs", "tank/fs/fs2a", "tank/fs/fs2b", "tank/fs/fs2c");
my %data;
for (#data) {
my #parts = split qr{/};
if (#parts == 1) { \( $data{$parts[0]} ); }
elsif (#parts == 2) { \( $data{$parts[0]}{$parts[1]} ); }
else { push #{ $data{$parts[0]}{$parts[1]} }, $parts[2]; }
}
You might even be able to use
my #data = ("tank", "tank/fs", "tank/fs/fs2a", "tank/fs/fs2b", "tank/fs/fs2c");
my %data;
for (#data) {
my #parts = split qr{/};
push #{ $data{$parts[0]}{$parts[1]} }, $parts[2] if #parts == 3;
}

Related

Pass hash to subroutine inside a subroutine already passed that hash

I am working with passing hashes to various subroutines, and I was wondering how to pass a hash to a subroutine and then pass the same hash inside that subroutine to a different subroutine and so on.
For example, the following code works fine.
use strict;
use warnings;
my %hash = (
key1 => 'value1',
key2 => 'value2',
key3 => 'value3',
key4 => '',
);
print %hash, "\n";
check_it(\%hash);
sub check_it {
my $params = shift;
foreach(keys %{$params}){
if($params->{$_}) {
print "'$_' defined as '$params->{$_}'\n";
}
else {
print "'$_' not defined as '$params->{$_}'. Deleting it.\n";
#delete $params->{$_};
$params->{$_} = 'null';
}
}
for ( my $i = 0 ; $i < 7 ; $i++ ) {
print "looping\n";
&check_tags_again(\%hash);
}
}
sub check_tags_again {
my $hash_now = shift;
#check again...
foreach(keys %{$hash_now}){
print "An element of hash: ", $hash_now->{$_}, "\n";
#if(defined $hash_now->{$_}){ delete $hash_now->{$_};}
}
&check_tags_thrice(\%hash);
}
sub check_tags_thrice {
my $hash_now = shift;
#check again...
foreach(keys %{$hash_now}){
print "An element of hash: ", $hash_now->{$_}, "\n";
#if(defined $hash_now->{$_}){ delete $hash_now->{$_};}
}
}
print "final hash:", %hash, "\n";
BUT, when I run the code that follows:
use strict;
use warnings;
use Data::Dumper;
sub process_data {
my $group_size = 10;
my %HoA = (
flintstones => [ "fred", "barney" ],
jetsons => [ "george", "jane", "elroy" ],
simpsons => [ "homer", "marge", "bart" ],
);
&delete_stuff( \%HoA, $group_size );
print "New group:\n";
print Dumper( \%HoA );
undef %HoA;
}
sub delete_stuff {
my $HoARef = shift;
my $group_size = shift;
print "group size in sub $group_size\n";
for ( my $j = 0 ; $j < $group_size ; $j++ ) {
my $dlted = &delete_other_stuff( \%HoA, $j );
print "deleted? '$dlted'\n";
if ( $dlted == 0 ) {
&presence_check( \%HoA, $j );
}
for ( my $i = 0 ; $i < $group_size ; $i++ ) {
}
}
}
sub delete_other_stuff {
my $HoAref = shift;
my $Dex = shift;
return $deleted;
}
sub presence_check {
my $HoAreF = shift;
my $DeX = shift;
}
I get:
Global symbol "%HoA" requires explicit package name at x.pl line 32.
Global symbol "%HoA" requires explicit package name at x.pl line 35.
Execution of x.pl aborted due to compilation errors.
I'm confused because I think it's doing the same thing as the first, but now it claims that %HoA was never initialized.
In delete_stuff, you don't have %HoA, you have $HoARef. If all the subs are expecting a reference to a hash, then you can just use it:
for ( my $j = 0 ; $j < $group_size ; $j++ ) {
my $dlted = &delete_other_stuff( $HoARef, $j );
print "deleted? '$dlted'\n";
if ( $dlted == 0 ) {
&presence_check( $HoARef, $j );
}
...
}
By the way, we're closing on 20 years of Perl 5. There is no reason to call a sub with explicitly passed parameters with an &, which is a Perl 4 holdover.

How can I flatten the arguments to my subroutine into an array?

Consider following script:
use strict;
use Data::Dumper;
my #arr=('1A','2A');
my $arr_ref=['1','2'];
sub routine1
{
my #arr=#_;
print Dumper(\#arr);
}
routine1(#arr,'one_A');
sub routine2
{
my $arr_ref=[#_];
print Dumper($arr_ref);
}
routine2($arr_ref,'one');
routine1 is using #arr and routine2 is using $arr_ref.
routine1 prints the following:
$VAR1 = [
'1A',
'2A',
'one_A'
];
routine2 prints following:
$VAR1 = [
[
'1',
'2'
],
'one'
];
I want to continue using #_ and arr_ref in routine2 but want to come up with below output:
$VAR1 = [
'1',
'2'
'one'
];
Can someone suggest the way out?
Using the function ref you can see if a scalar is a reference (and if so, which type). In a simplistic case where only array references will be passed you can simply use this to flatten the inputs.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
sub test {
my #arr = map { ref() ? #$_ : $_ } #_;
print Dumper \#arr;
}
test( ['a', 'b'], 1 );
As a side benefit, this code will die with a message if a reference to another type is passed, since you attempt to deference as an array. If you need to handle more, you will need to check the reference type. This starts to build in complexity quickly.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
sub test {
my #arr = map {
my $type = ref;
if ( ! $type ) {
$_;
} elsif ( $type eq 'ARRAY' ) {
#$_;
} elsif ( $type eq 'HASH' ) {
%$_;
} else {
()
}
} #_;
print Dumper \#arr;
}
test( ['a', 'b'], { p => 'q' }, 1 );
By returning an empty list for other reference types I silently ignore all other reference types. Or perhaps you would rather force stringification on other reference types.
...
} else {
"$_";
}
...
test( ['a','b'], sub{}, bless({},'MyClass'), 1 );
Of couse which of these handlings to use depends on you use case.
Just wrote this the other day at work.
sub flatten {
return map { ref($_) ? flatten(#{$_}) : ($_) } #_;
}
This program shows a subroutine flatten that will flatten a mixed list of simple data and array references, nested to any level.
use strict;
use warnings;
use Data::Dump;
my #arr = qw/ 1A 2A /;
my $arr_ref = [1, 2];
sub flatten;
routine1(#arr, 'one_A');
routine2($arr_ref, 'one');
sub routine1 {
my #arr=#_;
dd \#arr;
}
sub routine2 {
my $arr_ref = [flatten #_];
dd $arr_ref;
}
sub flatten {
my $i = 0;
while ($i < #_) {
my $item = $_[$i];
if (ref $item eq 'ARRAY') {
splice #_, $i, 1, #$item;
}
else {
++$i;
}
}
#_;
}
output
["1A", "2A", "one_A"]
[1, 2, "one"]

Perl: multidimensional hash

suppose I have the following data
cluster1:d(A),f(C)s,(A)
cluster2:r(D),h(D),f(A)
I want this out put
output:
cluster1:A->2
cluster1:C->1
cluster2:D->2
cluster2:A->1
here is my try,but it is not correct , the part that I am trying to count characters has a problem that I cant fix
the code is a part of very big code ,and I want exactly multidimensional hash
use strict;
use Data::Dumper;
my %count;
while (<DATA>) {
my %HoH;
my ( $cluster, $ch ) = split (/:/,$_);
$HoH{$cluster}={split /[()]+/,$ch};
for my $clust ( keys %HoH ) {
for my $character ( keys %{ $HoH{$clust} } ) {
$count{$clust}{$HoH{$clust}{$character}}++;
}
}
}
print Dumper(\%count);
foreach my $name (sort keys %count) {
foreach my $subject (keys %{$count{$name}}) {
print "$name:$subject->$count{$name}{$subject}\n";
}
}
DATA
cluster1:d(A),f(C)s,(A)
cluster2:r(D),h(D),f(A)
It would be nice if you try to understand the below code so that you can get an idea for solving the problem:-
use strict;
use Data::Dumper;
my $data = "cluster1:A,B,C,A";
my %cluster = ();
my ($cluster_key, $cluster_val ) = split (':', $data);
my #cluster1_data = split(',', $cluster_val);
foreach my $val ( #cluster1_data ) {
$cluster{$cluster_key}{$val}++;
}
print Dumper(\%cluster);
foreach my $clus ( keys %cluster ) {
my $clus_ref = $cluster{$clus};
foreach my $clu ( keys %{ $clus_ref } ){
my $count = $clus_ref->{$clu};
print"$clus:$clu->$count\n";
}
}
Output:
$VAR1 = {
'cluster1' => {
'A' => 2,
'C' => 1,
'B' => 1
}
};
cluster1:A->2
cluster1:C->1
cluster1:B->1
What do you expect $count{$cluster}{$characters}+=1; to do exactly? You have to loop over your input data to populate %count if you expect to get the desired result:
while (<DATA>) {
next unless /^(cluster\d+):(.+)/;
$count{$1}{$_}++ for split/,/, $2;
}
If you also add sort to the second foreach you'll get the output you want.
EDIT: This solves the question for the updated input and requirements:
my %count;
while (<DATA>) {
next unless /^(cluster\d+):(.+)/;
my $cluster = $1;
$count{$cluster}{$_}++ for $2 =~ /\((\w)\)/g;
}
for my $key (sort keys %count) {
for my $value (sort {
$count{$key}{$b} <=> $count{$key}{$a}
} keys %{$count{$key}}) {
print "$key:$value->$count{$key}{$value}\n";
}
}

Getting values from a hash by order defined in another list

Looking for a way, how to get the values from a hash, in an order defined by another list.
The "demo" code (real values are different):
use 5.014;
use warnings;
my $href = {
'Long one' => 'v1',
'xxx two' => 'v2',
'another3' => 'v3',
'xfour' => 'v4',
'some5' => undef,
};
#keys from the $href in defined order
my #order = ('Long one', 'xfour', 'nono', 'another3', 'some5', 'xxx two');
#in the real code:
#my $href = some_sub(......); my #order = another_sub(....);
#cleanup the #order form undefined values
#order = grep { exists $href->{$_} && defined $href->{$_} } #order;
#my input
while(<DATA>) {
chomp;
#filter out nonexistent keys and undefined values
my #defined_data = grep { exists $href->{$_} && defined $href->{$_} } split '/';
my $str = "xx";
$str = join('/',
map { $href->{$_} } some_my_sort(#defined_data)
) if #defined_data;
say $str;
}
sub some_my_sort {
my(#list) = #_;
# "somewhat"sort the #list in order defined by #order
# haven't any idea how to do this :(
# __HERE NEED HELP__ to sort the #list, to the order defined in the #order
#and get only first two values. if exists only one value, return only one
if($#list > 0) {
return ($list[0], $list[1]);
}
else {
return($list[0]);
}
}
__DATA__
another3/some5/Long one/xfour/xxx two
xxx two/blabla/some5/another3/xfour
some5
notexists/some5/xxx two/Long one
some5/another3
for the above input want get the next output:
v1/v4
v4/v3
xx
v1/v2
v3
Form #ikegami solution:
use 5.014;
use warnings;
my $href = { 'Long one' => 'v1', 'xxx two' => 'v2', 'another3' => 'v3', 'xfour' => 'v4', 'some5' => undef, };
my #order = ('Long one', 'xfour', 'nono', 'another3', 'some5', 'xxx two');
#order = grep { exists $href->{$_} && defined $href->{$_} } #order;
my %order = map { $order{$_} => $_ } 0..$#order;
while (<DATA>) {
chomp;
my #keys = grep { defined $href->{$_} } split '/';
#keys = sort { $order{$a} <=> $order{$b} } #keys;
splice(#keys, 2) if #keys > 2;
#keys = 'xx' if !#keys;
say join '/', #{$href}{ #keys };
}
get the next - error - and don't really understand why:
Global symbol "%order" requires explicit package name at ike line 8.
Execution of ike aborted due to compilation errors.
After defining #order, define %order_h:
my %order_h;
my $i = 0;
$order_h{$_} = $i++ for #order;
Then, instead of the comment about sorting, add this line:
#list = sort { $order_h{$a} <=> $order_h{$b} } #list;
Here's a cleaned-up version of the entire code:
my %order = map { $order[$_] => $_ } 0..$#order;
while (<DATA>) {
chomp;
my #keys = grep { defined $href->{$_} } split '/';
#keys = sort { $order{$a} <=> $order{$b} } #keys;
splice(#keys, 2) if #keys > 2;
say join '/', #keys ? #{$href}{ #keys } : 'xx';
}

Perl Working On Two Hash References

I would like to compare the values of two hash references.
The data dumper of my first hash is this:
$VAR1 = {
'42-MG-BA' => [
{
'chromosome' => '19',
'position' => '35770059',
'genotype' => 'TC'
},
{
'chromosome' => '2',
'position' => '68019584',
'genotype' => 'G'
},
{
'chromosome' => '16',
'position' => '9561557',
'genotype' => 'G'
},
And the second hash is similar to this but with more hashes in the array. I would like to compare the genotype of my first and second hash if the position and the choromosome matches.
map {print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n"}sort keys %$cave_snp_list;
map {print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n"}sort keys %$geno_seq_list;
I could do that for the first array of the hashes.
Could you help me in how to work for all the arrays?
This is my actual code in full
#!/software/bin/perl
use strict;
use warnings;
use Getopt::Long;
use Benchmark;
use Config::Config qw(Sequenom.ini);
useDatabase::Conn;
use Data::Dumper;
GetOptions("sam=s" => \my $sample);
my $geno_seq_list = getseqgenotypes($sample);
my $cave_snp_list = getcavemansnpfile($sample);
#print Dumper($geno_seq_list);
print scalar %$geno_seq_list, "\n";
foreach my $sam (keys %{$geno_seq_list}) {
my $seq_used = $geno_seq_list->{$sam};
my $cave_used = $cave_snp_list->{$sam};
print scalar(#$geno_seq_list->{$_}) if sort keys %$geno_seq_list, "\n";
print scalar(#$cave_used), "\n";
#foreach my $seq2com (# {$seq_used } ){
# foreach my $cave2com( # {$cave_used} ){
# print $seq2com->{chromosome},":" ,$cave2com->{chromosome},"\n";
# }
#}
map { print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n" } sort keys %$cave_snp_list;
map { print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n" } sort keys %$geno_seq_list;
}
sub getseqgenotypes {
my $snpconn;
my $gen_list = {};
$snpconn = Database::Conn->new('live');
$snpconn->addConnection(DBI->connect('dbi:Oracle:pssd.world', 'sn', 'ss', { RaiseError => 1, AutoCommit => 0 }),
'pssd');
#my $conn2 =Database::Conn->new('live');
#$conn2->addConnection(DBI->connect('dbi:Oracle:COSI.world','nst_owner','nst_owner', {RaiseError =>1 , AutoCommit=>0}),'nst');
my $id_ind = $snpconn->execute('snp::Sequenom::getIdIndforExomeSample', $sample);
my $genotype = $snpconn->executeArrRef('snp::Sequenom::getGenotypeCallsPosition', $id_ind);
foreach my $geno (#{$genotype}) {
push #{ $gen_list->{ $geno->[1] } }, {
chromosome => $geno->[2],
position => $geno->[3],
genotype => $geno->[4],
};
}
return ($gen_list);
} #end of sub getseqgenotypes
sub getcavemansnpfile {
my $nstconn;
my $caveman_list = {};
$nstconn = Database::Conn->new('live');
$nstconn->addConnection(
DBI->connect('dbi:Oracle:CANP.world', 'nst_owner', 'NST_OWNER', { RaiseError => 1, AutoCommit => 0 }), 'nst');
my $id_sample = $nstconn->execute('nst::Caveman::getSampleid', $sample);
#print "IDSample: $id_sample\n";
my $file_location = $nstconn->execute('nst::Caveman::getCaveManSNPSFile', $id_sample);
open(SNPFILE, "<$file_location") || die "Error: Cannot open the file $file_location:$!\n";
while (<SNPFILE>) {
chomp;
next if /^>/;
my #data = split;
my ($nor_geno, $tumor_geno) = split /\//, $data[5];
# array of hash
push #{ $caveman_list->{$sample} }, {
chromosome => $data[0],
position => $data[1],
genotype => $nor_geno,
};
} #end of while loop
close(SNPFILE);
return ($caveman_list);
}
The problem that I see is that you're constructing a tree for generic storage of data, when what you want is a graph, specific to the task. While you are constructing the record, you could also be constructing the part that groups data together. Below is just one example.
my %genotype_for;
my $record
= { chromosome => $data[0]
, position => $data[1]
, genotype => $nor_geno
};
push #{ $gen_list->{ $geno->[1] } }, $record;
# $genotype_for{ position }{ chromosome }{ name of array } = genotype code
$genotype_for{ $data[1] }{ $data[0] }{ $sample } = $nor_geno;
...
return ( $caveman_list, \%genotype_for );
In the main line, you receive them like so:
my ( $cave_snp_list, $geno_lookup ) = getcavemansnpfile( $sample );
This approach at least allows you to locate similar position and chromosome values. If you're going to do much with this, I might suggest an OO approach.
Update
Assuming that you wouldn't have to store the label, we could change the lookup to
$genotype_for{ $data[1] }{ $data[0] } = $nor_geno;
And then the comparison could be written:
foreach my $pos ( keys %$small_lookup ) {
next unless _HASH( my $sh = $small_lookup->{ $pos } )
and _HASH( my $lh = $large_lookup->{ $pos } )
;
foreach my $chrom ( keys %$sh ) {
next unless my $sc = $sh->{ $chrom }
and my $lc = $lh->{ $chrom }
;
print "$sc:$sc";
}
}
However, if you had limited use for the larger list, you could construct the specific case
and pass that in as a filter when creating the longer list.
Thus, in whichever loop creates the longer list, you could just go
...
next unless $sample{ $position }{ $chromosome };
my $record
= { chromosome => $chromosome
, position => $position
, genotype => $genotype
};
...