Perl - "Complex" Data Structure - perl

I'm trying to get a workable data structure that I can pull the element values from in a sensible fashion. Just having great difficulty working with the data once its in the structure. This is how the struct is built:
sub hopCompare
{
my %count;
my %master;
my $index = 0;
foreach my $objPath (#latest) #get Path object out of master array
{
my #path = #{$objPath->_getHopList()}; #dereferencing
my $iter = 0;
foreach my $hop (#path)
{
++$count{$hop}->{FREQ};
$count{$hop}->{INDEX} = $index;
$count{$hop}->{NODE} = $hop;
$index++;
}
$index = 0;
}
foreach my $element( keys %count )
{
if (defined($count{$element}->{NODE}))
{
my $curr = $count{$element}->{INDEX};
my $freq = $count{$element}->{FREQ};
if (($freq > 1) || ($count{$element}->{INDEX} =~ /[0-1]/))
{
push #{ $master{$curr} }, {$count{$element}->{NODE}, {FREQ => $count{$element}->{FREQ}}};
}
print "$element = $count{$element}\n";
print "$element Index = $count{$element}->{INDEX}\n";
}
}
print "\n Master contains: \n" . Dumper (%master);
if (%master){return %master;} else {die "NO FINAL HOPS MATCHED";}
}
Producing this structure:
%Master contains:
$VAR1 = '4';
$VAR2 = [
{
'1.1.1.2' => {
'FREQ' => 2
}
}
];
$VAR3 = '1';
$VAR4 = [
{
'1.1.1.9' => {
'FREQ' => 5
}
},
{
'1.1.1.8' => {
'FREQ' => 1
}
}
];
{truncated}
Although ideally the structure should look like this but I had even less joy trying to pull data out at sub identifyNode:
$VAR1 = {
'1' => [
{
'1.1.1.9' => {
'FREQ' => 5
}
},
{
'1.1.5.8' => {
'FREQ' => 1
}
}
],
Then to get back at the data in another method I'm using:
sub identifyNode
{
my %hops = %{$_[0]};
my $i = 0;
foreach my $h ( keys %hops ) #The HOP-INDEX is the key
{
print "\n\$h looks like \n" . Dumper ($hops{$h});
my %host = %{ $hops{$h}[0] }; #Push the first HASH in INDEX to the %host HASH
foreach my $hip (keys %host)
{
my $corelink = `corelinks $hip`;
my ($node) = $corelink =~ /([a-z0-9-]+),[a-z0-9-\/]+,$hip/s;
print "\n\t\t\tHostname is $node\n";
}
$i++;
}
}
This then generates:
$h looks like
$VAR1 = [
{
'1.1.1.2' => {
'FREQ' => 2
}
}
];
Hostname is blabla-bla-a1
$h looks like
$VAR1 = [
{
'1.1.1.9' => {
'FREQ' => 5
}
},
{
'1.1.1.8' => {
'FREQ' => 1
}
}
];
Hostname is somew-some-a1
So for each hash in $h only the topmost host gets evaluated and hostname returned. This is because it is told to do so by the [0] in line:
my %host = %{ $hops{$h}[0] };
I've played around with different data structures and de-referencing the structure a multitude of ways and this is the only halfway house I've found...
(The IPs have been obfuscated so are not consistent in my examples)

Thanks for your advice it got me halfway there. It works now (in still somewhat a convoluted fashion!) :
sub identifyNode
{
my %hops = %{$_[0]};
my $i = 0;
my #fin_nodes;
my $hindex;
foreach my $h ( keys %hops ) #The HOP-INDEX is the key
{
$hindex = $h;
foreach my $e (#{$hops{$h}}) #first part of solution credit Zdim
{
my #host = %{ $e }; #second part of solution
my $hip = $host[0];
my $corelink = `corelinks $hip`;
my ($node) = $corelink =~ /([a-z0-9-]+),[a-z0-9-\/]+,$hip/s;
print "\n\t\t\tHostname is $node\n";
push (#fin_nodes, [$node, $hindex, $e->{$hip}->{FREQ}]);
}
$i++;
}
return (\#fin_nodes);
}
Am I brave enough to add the data as a hash to #fin_nodes.. hmm

Related

How to pass a tree data structure by reference in Perl?

I am writing a script to solve very basic systems of equations. I convert the equations into binary expression trees, isolate the variable that I want the value of, and then do substitutions.
This is where I have a problem, I have a function "substitution" that walks the binary expression tree of the left side of the equation I want substituted. And when I found the variable to be substituted, I replace the node with the expression tree of another equation.
But when I try to return the new tree, my susbstitution is not there.
It is obviously a pass-by-reference / pass-by-value problem but I cannot find the way to solve it.
Here's a side script that shows the part which doesn't work:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
sub inorder {
my $expression = shift;
my $node = $expression;
if ($node->{type} eq "operator") {
print "(";
inorder($node->{left});
print $node->{value};
inorder($node->{right});
print ")";
}
else {
print $node->{value};
}
}
sub substitution {
my ($inserted_equation, $master_equation) = #_;
my $inserted_expression = $inserted_equation->{right_side};
my $insertion_point = $inserted_equation->{left_side}->{value};
my $master_expression = $master_equation->{right_side};
my #stack_tree_walk;
my $node = $master_expression;
push #stack_tree_walk, {$node->%*, left_visited => 0, side=> "left"};
while(#stack_tree_walk) {
if ($node->{type} eq "variable" and $node->{value} eq $insertion_point) {
foreach (#stack_tree_walk) {
}
# print $node->{value};
# print Dumper $inserted_expression;
$node = $inserted_expression; # WORKS
# print Dumper $node; # WORKS
# print Dumper $master_expression; # DOES NOT WORK
pop #stack_tree_walk;
$node = $stack_tree_walk[-1];
}
elsif ($node->{type} eq "operator") {
if (not $stack_tree_walk[-1]->{left_visited}) {
$stack_tree_walk[-1]->{left_visited} = 1;
$node = $node->{left};
push #stack_tree_walk, {$node->%*, left_visited => 0, side=> "left"};
}
elsif ($node->{side} eq "left") {
$node = $node->{right};
$stack_tree_walk[-1]->{side} = "right";
push #stack_tree_walk, {$node->%*, left_visited => 0, side=> "left"};
}
else {
pop #stack_tree_walk;
$node = $stack_tree_walk[-1];
}
}
else {
pop #stack_tree_walk;
$node = $stack_tree_walk[-1];
}
}
return {right_side=>$master_expression, left_side=>$master_equation->{left_side}};
}
my $equation = {left_side => { type=> "variable",
value=> "y"},
right_side=> { type=> "operator",
value=> "*",
left=> {type=> "variable", value=> "a"},
right=> {type=> "variable", value=> "b"} }
};
my $insertion = {left_side => { type=> "variable" ,
value=> "a" },
right_side=> { type=> "operator",
value=> "+",
left=> {type=> "variable", value=> "x"},
right=> {type=> "variable", value=> "y"} }
};
$,="";
$\="";
print "equations before substitution\n";
inorder($equation->{left_side});
print "=";
inorder($equation->{right_side});
print "\n";
inorder($insertion->{left_side});
print "=";
inorder($insertion->{right_side});
print "\n";
print "------------------\n";
$,="\n";
$\="\n\n";
my $final = substitution($insertion, $equation);
$,="";
$\="";
print "------------------\n";
print "equation substituted\n";
inorder($final->{left_side});
print "=";
inorder($final->{right_side});
print "\n";
Here is the OUPUT:
equations before substitution
y=(a*b)
a=(x+y)
equation substituted
y=(a*b) <==== this is the ERROR
y=((x+y)*b) <==== this should be the RIGHT result
I hope someone can show me which part is wrong.
Thank you.
$node is a essentially a pointer into the structure. Your code simply sets $node to a different pointer, i.e. $inserted_expression. You don't change the structure this way, you only change a local variable $node to point to different things. Basically you does this:
$struct = { foo => { bar => 1 } };
$node = $struct->{foo}; # points at { bar => 1 } in $struct
$node = { bar => 2 } # points at { bar => 2 } and not longer into $struct
print(Dumper($struct)); # unchanged
If you want to change the value you in the struct you need to take a reference to the value and not just take the value, i.e.
$struct = { foo => { bar => 1 } };
$node = \$struct->{foo}; # reference to value of { foo => ... }, currently { bar => 1 }
$$node = { bar => 2 } # changes value of { foo => ... } to { bar => 2 }
print(Dumper($struct)); # changed

How to delete an item from a Linked List in Perl?

I must be missing something in my implementation of remove item from a linked list in Perl.
The code below would remove all the nodes with value = 5.
But when I call remove_fives([[[[[], 3], 5], 1]]), it is deleting the last element, 3, instead of 5.
Any Help ?
sub remove_fives {
my $list = shift;
my $node = $list->[0];
while ( $node->[0] ) {
my $last = $node;
( $node, my $value ) = #$node;
if ($value == 5) {
$last->[0] = $node->[0];
}
}
return $list;
}
This is my following code that I used to test:
sub list_str {
my $list = shift;
my $result = '';
my $node = $list->[0];
my $first = 1;
while ( $node->[0] ) {
( $node, my $value ) = #$node;
if ( !$first ) {
$result .= ';';
}
$result .= $value;
$first = 0;
}
return $result;
}
my $list = [[[[[], 3], 5], 1]];
print "\n Original linked list: " . list_str($list); # 1;5;3
print "\n After remove: " . list_str(remove_fives($list)); # 1;5
$last points to the current node, and $node points to the next node.
$last should point to the previous node, and $node should point to the current node.
Conveniently, the list itself has the same format as a node, so we don't need trickery.
use Data::Dumper qw( Dumper );
sub remove_fives {
my $list = shift;
my $last = $list;
while (1) {
my $node = $last->[0];
last if !$node;
if ($node->[1] == 5) {
$last->[0] = $node->[0];
} else {
$last = $node;
}
}
}
my $list = [[[[undef, 3], 5], 1]];
remove_fives($list);
local $Data::Dumper::Indent = 0;
local $Data::Dumper::Terse = 1;
print(Dumper($list), "\n"); # [[[undef,3],1]]
If the list didn't have the same format as a node, this is what the code would look like:
sub delete_node {
my $node_ptr_ptr = shift;
$$node_ptr_ptr = $$node_ptr_ptr->next;
}
sub remove_fives {
my $list = shift;
my $node_ptr_ptr = \( $list->{head} );
while ($$node_ptr_ptr) {
if ($$node_ptr_ptr->{val} == 5) {
delete_node($$node_ptr_ptr);
} else {
$node_ptr_ptr = \( $$node_ptr_ptr->{next} );
}
}
}

Get value from hash of hashes

I would like to get value from hash of hashes but i do not. My code is :
sub test {
my $filename = $_[0];
open INFILE, ${filename} or die $!;
my %hashCount;
my #firstline = split('\t',<INFILE>);
shift(#firstline);
while (my $line = <INFILE>)
{
my %temp;
chomp($line);
my #line = split('\t', $line);
foreach my $cpt (1..$#line) {
$temp{$firstline[$cpt-1]}=$line[$cpt];
}
$hashCount{$line[0]}={%temp};
}
return %hashCount;
}
sub get_hash_of_hash {
my $h = shift;
foreach my $key (keys %$h) {
if( ref $h->{$key}) {
get_hash_of_hash( $h->{$key} );
}
else {
say $h->{$key};
}
}
}
And when i display my hash :
$VAR10679 = 'M00967_43_1106_2493_14707';
$VAR10680 = {
'A' => '1',
'B' => '0',
'C' => '1',
'D' => '0',
'E' => '0'
};
My first function return my hash of hashes and i get my specific value with the second function.
So I want to get value like that :
my %hashTest = test("FILE.txt");
get_hash_of_hash(%hashTest,"M00967_43_1106_2493_14707","A")
//return value '1'
You can either access nested elements like
$hash{keyA}{keyB}
or we can write a function that walks the data structure, like
sub walk {
my ($hashref, #keys) = #_;
my $pointer = $hashref;
for my $key (#keys) {
if (exists $pointer->{$key}) {
$pointer = $pointer->{$key};
} else {
die "No value at ", join "->", #keys;
}
}
return $pointer;
}
which can be used like
my %hash = (
'M00967_43_1106_2493_14707' => {
'A' => '1',
'B' => '0',
'C' => '1',
'D' => '0',
'E' => '0'
},
);
say walk(\%hash, 'M00967_43_1106_2493_14707', 'A');
Note: When using Data::Dumper, pass references to the Dump function:
print Dump \%hash; # not print Dump %hash
This is neccessary to show the correct data structure.
Your hash holds references to hashes.
You can access them like this:
$hashTest{'M00967_43_1106_2493_14707'}{'A'};
See perlref for more info
Use this subroutine..
sub get_hash_of_hash {
my $h = shift;
foreach my $key (keys %$h) {
if( ref $h->{$key}) {
get_hash_of_hash( $h->{$key} );
}
else {
print $h->{$key};
}
}
}

what is the correct loop/conditional option for not finding a variable?

I am searching through three text files for one of four specific gene names (stored in $var#). When it is found, it takes the value found after the gene name and adds it to a count. We then average the value by taking total $count_exp# and dividing by the number of appearances within all files.
What is the proper way to let the user know when a gene name is not found in each file? I'm having difficulties handling the flow of this loop/conditional.
Here is a snippet of code that handles one of the three text files....
foreach $hyperosmotic(#hyperosmotic)
{
#hyperosmotic1=split(/\t/,$hyperosmotic);
$name=$hyperosmotic1[0];
$exp=$hyperosmotic1[1];
chomp $name;
chomp $exp;
if ($name eq $var1)
{
$count_exp1 = $count_exp1 + $exp;
$count_var1 = ++$count_var1;
}
elsif ($name eq $var2)
{
$count_exp2 = $count_exp2 + $exp;
$count_var2 = ++$count_var2;
}
elsif ($name eq $var3)
{
$count_exp3 = $count_exp3 + $exp;
$count_var3 = ++$count_var3;
}
elsif ($name eq $var4)
{
$count_exp4 = $count_exp4 + $exp;
$count_var4 = ++$count_var4;
}
}
You basically want to use arrays:
(and use strict; use warnings;)
my #count_var = (0)x4;
my #count_exp = (0)x4;
my #var = ($var1, $var2, ...);
HYPEROSMOTIC:
for my $hyperosmotic (#hyperosmotic) {
my ($name, $exp) = split /\t/, $hyperosmotic;
for my $i (0 .. $#var) {
if ($name eq $var[$i]) {
$count_exp[$i] += $exp;
$count_var[$i]++;
next HYPEROSMOTIC; # jump into next iteration of the labeled loop
}
}
# this code is only reached if no var matched:
die qq[I don't have a var for name "$name"];
# That just threw a fatal error. You may want to do something different.
}
You could improve efficiency by using hashes:
my %counts = (
$var1 => {exp => 0, var => 0},
$var2 => {exp => 0, var => 0},
$var3 => {exp => 0, var => 0},
$var4 => {exp => 0, var => 0},
);
for my $hyperosmotic (#hyperosmotic) {
my ($name, $exp) = split ...;
if (my $count = $counts{$name}) {
$count->{exp} += $exp;
$count->{var}++;
} else {
die qq[I don't have a var for name "$name"];
}
}

Perl Working On Two Hash References

I would like to compare the values of two hash references.
The data dumper of my first hash is this:
$VAR1 = {
'42-MG-BA' => [
{
'chromosome' => '19',
'position' => '35770059',
'genotype' => 'TC'
},
{
'chromosome' => '2',
'position' => '68019584',
'genotype' => 'G'
},
{
'chromosome' => '16',
'position' => '9561557',
'genotype' => 'G'
},
And the second hash is similar to this but with more hashes in the array. I would like to compare the genotype of my first and second hash if the position and the choromosome matches.
map {print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n"}sort keys %$cave_snp_list;
map {print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n"}sort keys %$geno_seq_list;
I could do that for the first array of the hashes.
Could you help me in how to work for all the arrays?
This is my actual code in full
#!/software/bin/perl
use strict;
use warnings;
use Getopt::Long;
use Benchmark;
use Config::Config qw(Sequenom.ini);
useDatabase::Conn;
use Data::Dumper;
GetOptions("sam=s" => \my $sample);
my $geno_seq_list = getseqgenotypes($sample);
my $cave_snp_list = getcavemansnpfile($sample);
#print Dumper($geno_seq_list);
print scalar %$geno_seq_list, "\n";
foreach my $sam (keys %{$geno_seq_list}) {
my $seq_used = $geno_seq_list->{$sam};
my $cave_used = $cave_snp_list->{$sam};
print scalar(#$geno_seq_list->{$_}) if sort keys %$geno_seq_list, "\n";
print scalar(#$cave_used), "\n";
#foreach my $seq2com (# {$seq_used } ){
# foreach my $cave2com( # {$cave_used} ){
# print $seq2com->{chromosome},":" ,$cave2com->{chromosome},"\n";
# }
#}
map { print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n" } sort keys %$cave_snp_list;
map { print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n" } sort keys %$geno_seq_list;
}
sub getseqgenotypes {
my $snpconn;
my $gen_list = {};
$snpconn = Database::Conn->new('live');
$snpconn->addConnection(DBI->connect('dbi:Oracle:pssd.world', 'sn', 'ss', { RaiseError => 1, AutoCommit => 0 }),
'pssd');
#my $conn2 =Database::Conn->new('live');
#$conn2->addConnection(DBI->connect('dbi:Oracle:COSI.world','nst_owner','nst_owner', {RaiseError =>1 , AutoCommit=>0}),'nst');
my $id_ind = $snpconn->execute('snp::Sequenom::getIdIndforExomeSample', $sample);
my $genotype = $snpconn->executeArrRef('snp::Sequenom::getGenotypeCallsPosition', $id_ind);
foreach my $geno (#{$genotype}) {
push #{ $gen_list->{ $geno->[1] } }, {
chromosome => $geno->[2],
position => $geno->[3],
genotype => $geno->[4],
};
}
return ($gen_list);
} #end of sub getseqgenotypes
sub getcavemansnpfile {
my $nstconn;
my $caveman_list = {};
$nstconn = Database::Conn->new('live');
$nstconn->addConnection(
DBI->connect('dbi:Oracle:CANP.world', 'nst_owner', 'NST_OWNER', { RaiseError => 1, AutoCommit => 0 }), 'nst');
my $id_sample = $nstconn->execute('nst::Caveman::getSampleid', $sample);
#print "IDSample: $id_sample\n";
my $file_location = $nstconn->execute('nst::Caveman::getCaveManSNPSFile', $id_sample);
open(SNPFILE, "<$file_location") || die "Error: Cannot open the file $file_location:$!\n";
while (<SNPFILE>) {
chomp;
next if /^>/;
my #data = split;
my ($nor_geno, $tumor_geno) = split /\//, $data[5];
# array of hash
push #{ $caveman_list->{$sample} }, {
chromosome => $data[0],
position => $data[1],
genotype => $nor_geno,
};
} #end of while loop
close(SNPFILE);
return ($caveman_list);
}
The problem that I see is that you're constructing a tree for generic storage of data, when what you want is a graph, specific to the task. While you are constructing the record, you could also be constructing the part that groups data together. Below is just one example.
my %genotype_for;
my $record
= { chromosome => $data[0]
, position => $data[1]
, genotype => $nor_geno
};
push #{ $gen_list->{ $geno->[1] } }, $record;
# $genotype_for{ position }{ chromosome }{ name of array } = genotype code
$genotype_for{ $data[1] }{ $data[0] }{ $sample } = $nor_geno;
...
return ( $caveman_list, \%genotype_for );
In the main line, you receive them like so:
my ( $cave_snp_list, $geno_lookup ) = getcavemansnpfile( $sample );
This approach at least allows you to locate similar position and chromosome values. If you're going to do much with this, I might suggest an OO approach.
Update
Assuming that you wouldn't have to store the label, we could change the lookup to
$genotype_for{ $data[1] }{ $data[0] } = $nor_geno;
And then the comparison could be written:
foreach my $pos ( keys %$small_lookup ) {
next unless _HASH( my $sh = $small_lookup->{ $pos } )
and _HASH( my $lh = $large_lookup->{ $pos } )
;
foreach my $chrom ( keys %$sh ) {
next unless my $sc = $sh->{ $chrom }
and my $lc = $lh->{ $chrom }
;
print "$sc:$sc";
}
}
However, if you had limited use for the larger list, you could construct the specific case
and pass that in as a filter when creating the longer list.
Thus, in whichever loop creates the longer list, you could just go
...
next unless $sample{ $position }{ $chromosome };
my $record
= { chromosome => $chromosome
, position => $position
, genotype => $genotype
};
...