what is the correct loop/conditional option for not finding a variable? - perl

I am searching through three text files for one of four specific gene names (stored in $var#). When it is found, it takes the value found after the gene name and adds it to a count. We then average the value by taking total $count_exp# and dividing by the number of appearances within all files.
What is the proper way to let the user know when a gene name is not found in each file? I'm having difficulties handling the flow of this loop/conditional.
Here is a snippet of code that handles one of the three text files....
foreach $hyperosmotic(#hyperosmotic)
{
#hyperosmotic1=split(/\t/,$hyperosmotic);
$name=$hyperosmotic1[0];
$exp=$hyperosmotic1[1];
chomp $name;
chomp $exp;
if ($name eq $var1)
{
$count_exp1 = $count_exp1 + $exp;
$count_var1 = ++$count_var1;
}
elsif ($name eq $var2)
{
$count_exp2 = $count_exp2 + $exp;
$count_var2 = ++$count_var2;
}
elsif ($name eq $var3)
{
$count_exp3 = $count_exp3 + $exp;
$count_var3 = ++$count_var3;
}
elsif ($name eq $var4)
{
$count_exp4 = $count_exp4 + $exp;
$count_var4 = ++$count_var4;
}
}

You basically want to use arrays:
(and use strict; use warnings;)
my #count_var = (0)x4;
my #count_exp = (0)x4;
my #var = ($var1, $var2, ...);
HYPEROSMOTIC:
for my $hyperosmotic (#hyperosmotic) {
my ($name, $exp) = split /\t/, $hyperosmotic;
for my $i (0 .. $#var) {
if ($name eq $var[$i]) {
$count_exp[$i] += $exp;
$count_var[$i]++;
next HYPEROSMOTIC; # jump into next iteration of the labeled loop
}
}
# this code is only reached if no var matched:
die qq[I don't have a var for name "$name"];
# That just threw a fatal error. You may want to do something different.
}
You could improve efficiency by using hashes:
my %counts = (
$var1 => {exp => 0, var => 0},
$var2 => {exp => 0, var => 0},
$var3 => {exp => 0, var => 0},
$var4 => {exp => 0, var => 0},
);
for my $hyperosmotic (#hyperosmotic) {
my ($name, $exp) = split ...;
if (my $count = $counts{$name}) {
$count->{exp} += $exp;
$count->{var}++;
} else {
die qq[I don't have a var for name "$name"];
}
}

Related

Split string into a hash of hashes (perl)

at the moment im a little confused..
I am looking for a way to write a string with an indefinite number of words (separated by a slash) in a recursive hash.
These "strings" are output from a text database.
Given is for example
"office/1/hardware/mouse/count/200"
the next one can be longer or shorter..
This must be created from it:
{
office {
1{
hardware {
mouse {
count => 200
}
}
}
}
}
Any idea ?
Work backwards. Split the string. Use the last two elements to make the inner-most hash. While more words exist, make each one the key of a new hash, with the inner hash as its value.
my $s = "office/1/hardware/mouse/count/200";
my #word = split(/\//, $s);
# Bottom level taken explicitly
my $val = pop #word;
my $key = pop #word;
my $h = { $key => $val };
while ( my $key = pop #word )
{
$h = { $key => $h };
}
Simple recursive function should do
use strict;
use warnings;
use Data::Dumper;
sub foo {
my $str = shift;
my ($key, $rest) = split m|/|, $str, 2;
if (defined $rest) {
return { $key => foo($rest) };
} else {
return $key;
}
}
my $hash = foo("foo/bar/baz/2");
print Dumper $hash;
Gives output
$VAR1 = {
'foo' => {
'bar' => {
'baz' => '2'
}
}
};
But like I said in the comment: What do you intend to use this for? It is not a terribly useful structure.
If there are many lines to be read into a single hash and the lines have a variable number of fields, you have big problems and the other two answers will clobber data by either smashing sibling keys or overwriting final values. I'm supposing this because there is no rational reason to convert a single line into a hash.
You will have to walk down the hash with each field. This will also give you the most control over the process.
our $hash = {};
our $eolmark = "\000";
while (my $line = <...>) {
chomp $line;
my #fields = split /\//, $line;
my $count = #fields;
my $h = $hash;
my $i = 0;
map { (++$i == $count) ?
($h->{$_}{$eolmark} = 1) :
($h = $h->{$_} ||= {});
} #fields;
}
$h->{$_}{$eolmark} = 1 You need the special "end of line" key so that you can recognize the end of a record and still permit longer records to coexist. If you had two records
foo/bar/baz foo/bar/baz/quux, the second would overwrite the final value of the first.
$h = $h->{$_} ||= {} This statement is a very handy idiom to both create and populate a cache in one step and then take a shortcut reference to it. Never do a hash lookup more than once.
HTH

Perl - "Complex" Data Structure

I'm trying to get a workable data structure that I can pull the element values from in a sensible fashion. Just having great difficulty working with the data once its in the structure. This is how the struct is built:
sub hopCompare
{
my %count;
my %master;
my $index = 0;
foreach my $objPath (#latest) #get Path object out of master array
{
my #path = #{$objPath->_getHopList()}; #dereferencing
my $iter = 0;
foreach my $hop (#path)
{
++$count{$hop}->{FREQ};
$count{$hop}->{INDEX} = $index;
$count{$hop}->{NODE} = $hop;
$index++;
}
$index = 0;
}
foreach my $element( keys %count )
{
if (defined($count{$element}->{NODE}))
{
my $curr = $count{$element}->{INDEX};
my $freq = $count{$element}->{FREQ};
if (($freq > 1) || ($count{$element}->{INDEX} =~ /[0-1]/))
{
push #{ $master{$curr} }, {$count{$element}->{NODE}, {FREQ => $count{$element}->{FREQ}}};
}
print "$element = $count{$element}\n";
print "$element Index = $count{$element}->{INDEX}\n";
}
}
print "\n Master contains: \n" . Dumper (%master);
if (%master){return %master;} else {die "NO FINAL HOPS MATCHED";}
}
Producing this structure:
%Master contains:
$VAR1 = '4';
$VAR2 = [
{
'1.1.1.2' => {
'FREQ' => 2
}
}
];
$VAR3 = '1';
$VAR4 = [
{
'1.1.1.9' => {
'FREQ' => 5
}
},
{
'1.1.1.8' => {
'FREQ' => 1
}
}
];
{truncated}
Although ideally the structure should look like this but I had even less joy trying to pull data out at sub identifyNode:
$VAR1 = {
'1' => [
{
'1.1.1.9' => {
'FREQ' => 5
}
},
{
'1.1.5.8' => {
'FREQ' => 1
}
}
],
Then to get back at the data in another method I'm using:
sub identifyNode
{
my %hops = %{$_[0]};
my $i = 0;
foreach my $h ( keys %hops ) #The HOP-INDEX is the key
{
print "\n\$h looks like \n" . Dumper ($hops{$h});
my %host = %{ $hops{$h}[0] }; #Push the first HASH in INDEX to the %host HASH
foreach my $hip (keys %host)
{
my $corelink = `corelinks $hip`;
my ($node) = $corelink =~ /([a-z0-9-]+),[a-z0-9-\/]+,$hip/s;
print "\n\t\t\tHostname is $node\n";
}
$i++;
}
}
This then generates:
$h looks like
$VAR1 = [
{
'1.1.1.2' => {
'FREQ' => 2
}
}
];
Hostname is blabla-bla-a1
$h looks like
$VAR1 = [
{
'1.1.1.9' => {
'FREQ' => 5
}
},
{
'1.1.1.8' => {
'FREQ' => 1
}
}
];
Hostname is somew-some-a1
So for each hash in $h only the topmost host gets evaluated and hostname returned. This is because it is told to do so by the [0] in line:
my %host = %{ $hops{$h}[0] };
I've played around with different data structures and de-referencing the structure a multitude of ways and this is the only halfway house I've found...
(The IPs have been obfuscated so are not consistent in my examples)
Thanks for your advice it got me halfway there. It works now (in still somewhat a convoluted fashion!) :
sub identifyNode
{
my %hops = %{$_[0]};
my $i = 0;
my #fin_nodes;
my $hindex;
foreach my $h ( keys %hops ) #The HOP-INDEX is the key
{
$hindex = $h;
foreach my $e (#{$hops{$h}}) #first part of solution credit Zdim
{
my #host = %{ $e }; #second part of solution
my $hip = $host[0];
my $corelink = `corelinks $hip`;
my ($node) = $corelink =~ /([a-z0-9-]+),[a-z0-9-\/]+,$hip/s;
print "\n\t\t\tHostname is $node\n";
push (#fin_nodes, [$node, $hindex, $e->{$hip}->{FREQ}]);
}
$i++;
}
return (\#fin_nodes);
}
Am I brave enough to add the data as a hash to #fin_nodes.. hmm

looping through hash of hashes reference perl

I have a hash of hashes that I am passing to a subroutine. In the subroutine I need to loop over the hash of hashes and access the value of the inner hash based on the outer hash's key. I am having trouble referencing and dereferencing the hash of hashes.
Here is my code.
use List::Util qw( min max );
##testingWords is array of strings
foreach(#testingWords)
{
#skip values that are '[' or ']' and move onto next value in array.
if($_ eq '[' or $_ eq ']')
{
next;
}
#if value in array matches key in %trainingHashRaw (hash of hashes)pass key to getMax.
if($trainingHashRaw{$_})
{
#key is $_, value is returned string from getMax
#%trainingHashRelative is hash of hashes
$testingHash{$_} = getMax($_, \%trainingHashRelative);
}
}
sub getMax
{
my $key = shift;
my $hash = shift;
my #max = ();
my $max = 0;
my $tag = "";
for my $i(keys $hash)
{
for my $j(keys $hash->{$i})
{
if($key eq $i)
{
push(#max, $hash->{$i}->{$j});
}
}
if(#max)
{
$max = max #max;
}
}
for my $i(keys $hash)
{
for my $j(keys $hash->{$i})
{
if($max == $hash->{$i}->{$j})
{
$tag = $j;
}
}
}
return $tag;
}
It is not very clear what your data structures should contain. So I made up an example for what I think that you have meant. The max value can occur more than once, so I keep track of all occurrences.
use strict;
use warnings;
use List::Util qw( min max );
use Data::Dumper;
# for each outer key we want to get the max of values of the associated hashref
# outer_a: 3
# outer_b: 100
my %hash_of_hashes = (
outer_a => {
inner_a_x => 1,
inner_a_y => 2,
inner_a_z => 3,
},
outer_b => {
inner_a_x => -100,
inner_a_y => 100,
},
outer_c => {
inner_a_x => 100,
inner_a_y => 1,
}
);
my ($max_value, $keys_of_max_value) = get_max( \%hash_of_hashes );
print "The max value $max_value occured in ", join( ' ,', #{$keys_of_max_value} ), ".\n";
sub get_max {
my ($hoh_ref) = #_; # reference to a hash of hashes
# %tmp keeps track of the outer_key for the max of the inner values
# %tmp = (
# 3 => ['outer_a'],
# 100 => ['outer_b', 'outer_c']
# )
my %tmp;
while ( my ($outer_key, $inner_hashref) = each %{$hoh_ref} ) {
my #inner_values = values %{$inner_hashref};
my $max_inner_values = max( #inner_values );
$tmp{$max_inner_values} ||= []; # for clarity create the arref expiclitly
push #{$tmp{$max_inner_values}}, $outer_key;
}
my $max_value = max( keys %tmp ); # 100
return ( $max_value, $tmp{$max_value} ); # 100, ['outer_a', 'outer_b']
}

Perl LDAP search - over 1500 member in a group

I want to search with an Perl script and ldap connection all members of a group with over 10.000 member.
I can only find results, if i set $first=0 and $last=1499 and than i get only the first 1500 member of the group.
If i use other parameter for $first and $last, then i got no results.
"$ldapsearchresult = $ldapconnect->search (
Sizelimit => 0,
base => 'any_base',
filter => '(objectClass=*)',
attr => ['member;range=$first-$last'],
);"
Thanks for your help!
You need to search the attribute range as a subtype again and again until the last return '*'.
Here is the code I am using, it is also use paged search in AD.
use Net::LDAP;
use Net::LDAP qw(LDAP_CONTROL_PAGED);
use Net::LDAP::Util qw(ldap_error_name canonical_dn ldap_explode_dn ldap_error_text);
use Net::LDAP::Control::Paged;
my $page_page = Net::LDAP::Control::Paged->new( 'size' => $input{'page'} );
my $finished_search = 0;
my $page_cookie;
my $result;
my #page_search_args = (
'base' => $input{"base"},
'scope' => $input{'scope'},
'filter' => $input{'filter'},
'attrs' => $input{'attrs'},
'control' => [ $page_page ],
'deref' => 'never',
'raw' => qr!^DO_NOT_MATCH!,
);
while (!$finished_search) {
my $msg = $ldap->search(#page_search_args);
if ($msg->is_error()) {
die "ERROR: ",$msg->error,"\n";
last;
} else {
my ($response) = $msg->control(LDAP_CONTROL_PAGED);
$page_cookie = $response->cookie();
$finished_search = 1 if !$page_cookie;
$page_page->cookie($page_cookie);
while (my $entry = $msg->pop_entry()){
$ldap_searches++;
print_all_attributes($entry);
}
}
}
if ($page_cookie) {
$page_page->cookie($page_cookie);
$page_page->size(0);
$ldap->search(#page_search_args);
}
sub add_result {
my $dn = shift;
my $attr = shift;
my $data = shift;
my $res = shift;
$attr =~ s!(;range\=\d+\-\d+)!!i;
#print "removed $1 from $attr" if $1;
foreach my $subtype (keys %{$data}){
$attr = $attr.$subtype if $subtype ne '';
$attr =~ s!(;range\=\d+\-\d+)!!i;
if (defined $$res->{$dn}->{$attr}){
push(#{$$res->{$dn}->{$attr}},#{$data->{$subtype}});
} else {
push(#{$$res->{$dn}->{$attr}},#{$data->{$subtype}});
}
}
return $res;
}
sub print_all_attributes {
my $entry = shift;
foreach my $attr ($entry->attributes()) {
if ($attr =~ /;range=/) {
my $last = 0;my $first = 0;
### $var will look like this --> "member;range=0-1499"
(my $pure_attr,my $range) = split /;/, $attr,2;
(my $junk,$range) = split /=/, $range,2;
($first,$last) = split /-/, $range,2;
$i++;
add_result($entry->dn(),$pure_attr,$entry->get_value($attr,alloptions => 1, asref => 1),\$result) if $last eq '*' or $last >= $parms{'attribute_page'};
### if $last eq "*", indicates this is the last range increment, and
### we do not need to perform another supplemental search
if ($last ne "*") {
my $range_diff = ($last - $first) + 1;
my $increment = $last + $range_diff;
$last = $last + 1;
$attr = "$pure_attr;range=$last-$increment";
$parms{'attrs'} = [$attr];
search_nonpaged(%parms);
}
} else {### if $attr matches range pattern
add_result($entry->dn(),$attr,$entry->get_value($attr,alloptions => 1, asref => 1),\$result);
}
}
return 1;
}
sub search_nonpaged{
my %input = #_;
my #page_search_args = (
'base' => $input{"base"},
'scope' => $input{'scope'},
'filter' => $input{'filter'},
'attrs' => $input{'attrs'},
'deref' => 'never',
'raw' => qr!^DO_NOT_MATCH!,
);
my $msg = $ldap->search(#page_search_args);
if ($msg->is_error()) {
die "ERROR: ",$msg->error,"\n";
}
while (my $entry = $msg->pop_entry()){
$ldap_searches++;
print_all_attributes($entry);
}
}
You maybe able to simplify the program by searching for:
memberOf=CN=GroupOne,OU=Security Groups,OU=Groups,DC=YOURDOMAIN,DC=NET
You will still need to use the paged results control but will not need the range control.
Microsoft Active Directory uses the MaxValRange to control the number of values that are returned in the retrieval of multi-valued attributes of an entry.
By using the filter above, you can avoid the MaxValRange settings.
BY THE WAY: if you want to obtain nested members also, try:
(memberOf:1.2.840.113556.1.4.1941:=CN=GroupOne,OU=Security Groups,OU=Groups,DC=YOURDOMAIN,DC=NET)
This filter uses the LDAP_MATCHING_RULE_IN_CHAIN extensible match.
-jim
I found an easier way to search all member of a AD group:
http://permalink.gmane.org/gmane.comp.lang.perl.modules.ldap/246
use Net::LDAP; use Net::LDAP::Util;
# Connect to AD make sure to specify version 3
$ldap = new Net::LDAP("myGC.yy.xx.com",
port => 3268,
debug => 0,
version => 3
) or die "New failed:$ <at> ";
# Do an anonymous bind. You MAY have to do an authenticated bind in your configuration
$result=$ldap->ldapbind() || die "Bind Failed:$ <at> ";
# Some error trapping
$err=$result->code;
if ($err){
$errname=Net::LDAP::Util::ldap_error_name($err);
$errtxt=Net::LDAP::Util::ldap_error_text($err);
if ($errtxt){
print "($err) $errtxt\n";
}
else
{
if ($errname){
print "($err) $errname\n";
}
else
{
print "ERR: $err\n";
}
}
exit;
}
# The combination of the search base and filter determine which object that you
# retrieve
# set search filter to groups of objects. This is what you want to enumerate NT groups.
$filter="(objectClass=group)";
# Set the search base to the DN of the object that you want to retrieve. BTW, using this method on
# groups with less than 1000 members works as well.
$base='CN=mygroup,DCyyy,DC=xxx,DC=com';
# Set the initial attribute indexes and name
$found=1; $startr=0; $endr=-1; $startattr="member";
while($found){
# Create the attribute range specification
$startr=$endr+1;
$endr=$startr+999;
$attr="$startattr;range=$startr-$endr";
$saveattr=$attr;
<at> attr=("$attr");
# Perform the search
$result=$mesg = $ldap->search(base => "$base",filter => $filter,
attrs => [ <at> attr],
scope => "sub") or die "search died";
# Some error trapping
$err=$result->code;
if ($err){
if (!($err == 1)){
$errname=Net::LDAP::Util::ldap_error_name($err);
$errtxt=Net::LDAP::Util::ldap_error_text($err);
if ($errtxt){
print "($err) $errtxt\n";
}
else
{
if ($errname){
print "($err) $errname\n";
}
else
{
print "ERR: $err\n";
}
}
}
else
{
print "COUNT=$cnt\n";
}
exit;
}
$found=0;
# OK, get the attribute range...so we can update the value of the attribute
# on the next pass
foreach $entry ($mesg->all_entries) {
<at> attr=$entry->attributes;
foreach( <at> attr){
$curattr=$_;
}
}
# Print out the current chunk of members
foreach $entry ($mesg->all_entries) {
$ar=$entry->get("$curattr");
foreach( <at> $ar){
$cnt++;
print "$_\n";
}
$found=1;
if (! <at> $ar[0]){
$found=0;
}
}
# Check to see if we got the last chunk. If we did print toe total and set
# the found flag so we don't search for anymore members
if ($curattr=~/\;range=/){
if ($curattr=~/\-\*/){
print "LASTCOUNT:$cnt\n";
$found=0;
}
}
}

Perl Working On Two Hash References

I would like to compare the values of two hash references.
The data dumper of my first hash is this:
$VAR1 = {
'42-MG-BA' => [
{
'chromosome' => '19',
'position' => '35770059',
'genotype' => 'TC'
},
{
'chromosome' => '2',
'position' => '68019584',
'genotype' => 'G'
},
{
'chromosome' => '16',
'position' => '9561557',
'genotype' => 'G'
},
And the second hash is similar to this but with more hashes in the array. I would like to compare the genotype of my first and second hash if the position and the choromosome matches.
map {print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n"}sort keys %$cave_snp_list;
map {print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n"}sort keys %$geno_seq_list;
I could do that for the first array of the hashes.
Could you help me in how to work for all the arrays?
This is my actual code in full
#!/software/bin/perl
use strict;
use warnings;
use Getopt::Long;
use Benchmark;
use Config::Config qw(Sequenom.ini);
useDatabase::Conn;
use Data::Dumper;
GetOptions("sam=s" => \my $sample);
my $geno_seq_list = getseqgenotypes($sample);
my $cave_snp_list = getcavemansnpfile($sample);
#print Dumper($geno_seq_list);
print scalar %$geno_seq_list, "\n";
foreach my $sam (keys %{$geno_seq_list}) {
my $seq_used = $geno_seq_list->{$sam};
my $cave_used = $cave_snp_list->{$sam};
print scalar(#$geno_seq_list->{$_}) if sort keys %$geno_seq_list, "\n";
print scalar(#$cave_used), "\n";
#foreach my $seq2com (# {$seq_used } ){
# foreach my $cave2com( # {$cave_used} ){
# print $seq2com->{chromosome},":" ,$cave2com->{chromosome},"\n";
# }
#}
map { print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n" } sort keys %$cave_snp_list;
map { print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n" } sort keys %$geno_seq_list;
}
sub getseqgenotypes {
my $snpconn;
my $gen_list = {};
$snpconn = Database::Conn->new('live');
$snpconn->addConnection(DBI->connect('dbi:Oracle:pssd.world', 'sn', 'ss', { RaiseError => 1, AutoCommit => 0 }),
'pssd');
#my $conn2 =Database::Conn->new('live');
#$conn2->addConnection(DBI->connect('dbi:Oracle:COSI.world','nst_owner','nst_owner', {RaiseError =>1 , AutoCommit=>0}),'nst');
my $id_ind = $snpconn->execute('snp::Sequenom::getIdIndforExomeSample', $sample);
my $genotype = $snpconn->executeArrRef('snp::Sequenom::getGenotypeCallsPosition', $id_ind);
foreach my $geno (#{$genotype}) {
push #{ $gen_list->{ $geno->[1] } }, {
chromosome => $geno->[2],
position => $geno->[3],
genotype => $geno->[4],
};
}
return ($gen_list);
} #end of sub getseqgenotypes
sub getcavemansnpfile {
my $nstconn;
my $caveman_list = {};
$nstconn = Database::Conn->new('live');
$nstconn->addConnection(
DBI->connect('dbi:Oracle:CANP.world', 'nst_owner', 'NST_OWNER', { RaiseError => 1, AutoCommit => 0 }), 'nst');
my $id_sample = $nstconn->execute('nst::Caveman::getSampleid', $sample);
#print "IDSample: $id_sample\n";
my $file_location = $nstconn->execute('nst::Caveman::getCaveManSNPSFile', $id_sample);
open(SNPFILE, "<$file_location") || die "Error: Cannot open the file $file_location:$!\n";
while (<SNPFILE>) {
chomp;
next if /^>/;
my #data = split;
my ($nor_geno, $tumor_geno) = split /\//, $data[5];
# array of hash
push #{ $caveman_list->{$sample} }, {
chromosome => $data[0],
position => $data[1],
genotype => $nor_geno,
};
} #end of while loop
close(SNPFILE);
return ($caveman_list);
}
The problem that I see is that you're constructing a tree for generic storage of data, when what you want is a graph, specific to the task. While you are constructing the record, you could also be constructing the part that groups data together. Below is just one example.
my %genotype_for;
my $record
= { chromosome => $data[0]
, position => $data[1]
, genotype => $nor_geno
};
push #{ $gen_list->{ $geno->[1] } }, $record;
# $genotype_for{ position }{ chromosome }{ name of array } = genotype code
$genotype_for{ $data[1] }{ $data[0] }{ $sample } = $nor_geno;
...
return ( $caveman_list, \%genotype_for );
In the main line, you receive them like so:
my ( $cave_snp_list, $geno_lookup ) = getcavemansnpfile( $sample );
This approach at least allows you to locate similar position and chromosome values. If you're going to do much with this, I might suggest an OO approach.
Update
Assuming that you wouldn't have to store the label, we could change the lookup to
$genotype_for{ $data[1] }{ $data[0] } = $nor_geno;
And then the comparison could be written:
foreach my $pos ( keys %$small_lookup ) {
next unless _HASH( my $sh = $small_lookup->{ $pos } )
and _HASH( my $lh = $large_lookup->{ $pos } )
;
foreach my $chrom ( keys %$sh ) {
next unless my $sc = $sh->{ $chrom }
and my $lc = $lh->{ $chrom }
;
print "$sc:$sc";
}
}
However, if you had limited use for the larger list, you could construct the specific case
and pass that in as a filter when creating the longer list.
Thus, in whichever loop creates the longer list, you could just go
...
next unless $sample{ $position }{ $chromosome };
my $record
= { chromosome => $chromosome
, position => $position
, genotype => $genotype
};
...