Perl LDAP search - over 1500 member in a group

Perl LDAP search - over 1500 member in a group - perl

I want to search with an Perl script and ldap connection all members of a group with over 10.000 member.
I can only find results, if i set $first=0 and $last=1499 and than i get only the first 1500 member of the group.
If i use other parameter for $first and $last, then i got no results.
"$ldapsearchresult = $ldapconnect->search (
Sizelimit => 0,
base => 'any_base',
filter => '(objectClass=*)',
attr => ['member;range=$first-$last'],
);"
Thanks for your help!

You need to search the attribute range as a subtype again and again until the last return '*'.
Here is the code I am using, it is also use paged search in AD.
use Net::LDAP;
use Net::LDAP qw(LDAP_CONTROL_PAGED);
use Net::LDAP::Util qw(ldap_error_name canonical_dn ldap_explode_dn ldap_error_text);
use Net::LDAP::Control::Paged;
my $page_page = Net::LDAP::Control::Paged->new( 'size' => $input{'page'} );
my $finished_search = 0;
my $page_cookie;
my $result;
my #page_search_args = (
'base' => $input{"base"},
'scope' => $input{'scope'},
'filter' => $input{'filter'},
'attrs' => $input{'attrs'},
'control' => [ $page_page ],
'deref' => 'never',
'raw' => qr!^DO_NOT_MATCH!,
);
while (!$finished_search) {
my $msg = $ldap->search(#page_search_args);
if ($msg->is_error()) {
die "ERROR: ",$msg->error,"\n";
last;
} else {
my ($response) = $msg->control(LDAP_CONTROL_PAGED);
$page_cookie = $response->cookie();
$finished_search = 1 if !$page_cookie;
$page_page->cookie($page_cookie);
while (my $entry = $msg->pop_entry()){
$ldap_searches++;
print_all_attributes($entry);
}
}
}
if ($page_cookie) {
$page_page->cookie($page_cookie);
$page_page->size(0);
$ldap->search(#page_search_args);
}
sub add_result {
my $dn = shift;
my $attr = shift;
my $data = shift;
my $res = shift;
$attr =~ s!(;range\=\d+\-\d+)!!i;
#print "removed $1 from $attr" if $1;
foreach my $subtype (keys %{$data}){
$attr = $attr.$subtype if $subtype ne '';
$attr =~ s!(;range\=\d+\-\d+)!!i;
if (defined $$res->{$dn}->{$attr}){
push(#{$$res->{$dn}->{$attr}},#{$data->{$subtype}});
} else {
push(#{$$res->{$dn}->{$attr}},#{$data->{$subtype}});
}
}
return $res;
}
sub print_all_attributes {
my $entry = shift;
foreach my $attr ($entry->attributes()) {
if ($attr =~ /;range=/) {
my $last = 0;my $first = 0;
### $var will look like this --> "member;range=0-1499"
(my $pure_attr,my $range) = split /;/, $attr,2;
(my $junk,$range) = split /=/, $range,2;
($first,$last) = split /-/, $range,2;
$i++;
add_result($entry->dn(),$pure_attr,$entry->get_value($attr,alloptions => 1, asref => 1),\$result) if $last eq '*' or $last >= $parms{'attribute_page'};
### if $last eq "*", indicates this is the last range increment, and
### we do not need to perform another supplemental search
if ($last ne "*") {
my $range_diff = ($last - $first) + 1;
my $increment = $last + $range_diff;
$last = $last + 1;
$attr = "$pure_attr;range=$last-$increment";
$parms{'attrs'} = [$attr];
search_nonpaged(%parms);
}
} else {### if $attr matches range pattern
add_result($entry->dn(),$attr,$entry->get_value($attr,alloptions => 1, asref => 1),\$result);
}
}
return 1;
}
sub search_nonpaged{
my %input = #_;
my #page_search_args = (
'base' => $input{"base"},
'scope' => $input{'scope'},
'filter' => $input{'filter'},
'attrs' => $input{'attrs'},
'deref' => 'never',
'raw' => qr!^DO_NOT_MATCH!,
);
my $msg = $ldap->search(#page_search_args);
if ($msg->is_error()) {
die "ERROR: ",$msg->error,"\n";
}
while (my $entry = $msg->pop_entry()){
$ldap_searches++;
print_all_attributes($entry);
}
}

You maybe able to simplify the program by searching for:
memberOf=CN=GroupOne,OU=Security Groups,OU=Groups,DC=YOURDOMAIN,DC=NET
You will still need to use the paged results control but will not need the range control.
Microsoft Active Directory uses the MaxValRange to control the number of values that are returned in the retrieval of multi-valued attributes of an entry.
By using the filter above, you can avoid the MaxValRange settings.
BY THE WAY: if you want to obtain nested members also, try:
(memberOf:1.2.840.113556.1.4.1941:=CN=GroupOne,OU=Security Groups,OU=Groups,DC=YOURDOMAIN,DC=NET)
This filter uses the LDAP_MATCHING_RULE_IN_CHAIN extensible match.
-jim

I found an easier way to search all member of a AD group:
http://permalink.gmane.org/gmane.comp.lang.perl.modules.ldap/246
use Net::LDAP; use Net::LDAP::Util;
# Connect to AD make sure to specify version 3
$ldap = new Net::LDAP("myGC.yy.xx.com",
port => 3268,
debug => 0,
version => 3
) or die "New failed:$ <at> ";
# Do an anonymous bind. You MAY have to do an authenticated bind in your configuration
$result=$ldap->ldapbind() || die "Bind Failed:$ <at> ";
# Some error trapping
$err=$result->code;
if ($err){
$errname=Net::LDAP::Util::ldap_error_name($err);
$errtxt=Net::LDAP::Util::ldap_error_text($err);
if ($errtxt){
print "($err) $errtxt\n";
}
else
{
if ($errname){
print "($err) $errname\n";
}
else
{
print "ERR: $err\n";
}
}
exit;
}
# The combination of the search base and filter determine which object that you
# retrieve
# set search filter to groups of objects. This is what you want to enumerate NT groups.
$filter="(objectClass=group)";
# Set the search base to the DN of the object that you want to retrieve. BTW, using this method on
# groups with less than 1000 members works as well.
$base='CN=mygroup,DCyyy,DC=xxx,DC=com';
# Set the initial attribute indexes and name
$found=1; $startr=0; $endr=-1; $startattr="member";
while($found){
# Create the attribute range specification
$startr=$endr+1;
$endr=$startr+999;
$attr="$startattr;range=$startr-$endr";
$saveattr=$attr;
<at> attr=("$attr");
# Perform the search
$result=$mesg = $ldap->search(base => "$base",filter => $filter,
attrs => [ <at> attr],
scope => "sub") or die "search died";
# Some error trapping
$err=$result->code;
if ($err){
if (!($err == 1)){
$errname=Net::LDAP::Util::ldap_error_name($err);
$errtxt=Net::LDAP::Util::ldap_error_text($err);
if ($errtxt){
print "($err) $errtxt\n";
}
else
{
if ($errname){
print "($err) $errname\n";
}
else
{
print "ERR: $err\n";
}
}
}
else
{
print "COUNT=$cnt\n";
}
exit;
}
$found=0;
# OK, get the attribute range...so we can update the value of the attribute
# on the next pass
foreach $entry ($mesg->all_entries) {
<at> attr=$entry->attributes;
foreach( <at> attr){
$curattr=$_;
}
}
# Print out the current chunk of members
foreach $entry ($mesg->all_entries) {
$ar=$entry->get("$curattr");
foreach( <at> $ar){
$cnt++;
print "$_\n";
}
$found=1;
if (! <at> $ar[0]){
$found=0;
}
}
# Check to see if we got the last chunk. If we did print toe total and set
# the found flag so we don't search for anymore members
if ($curattr=~/\;range=/){
if ($curattr=~/\-\*/){
print "LASTCOUNT:$cnt\n";
$found=0;
}
}
}

Related

Virtual Filesystem in Perl with Fuse

Anybody help me make a virtual file system in Perl.
Very simple, 2 depth level, as
/subdir
subdir-l2
file2.txt
/file1.txt
I try use Fuse.pm, but not understand how create subdir level. I create %files hash, and if go to subdir, recreate it with new records. It's for test only.
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use Fuse;
use POSIX qw(ENOENT EISDIR EINVAL);
my (%files) = (
'.' => {
type => 0040,
mode => 0755,
ctime => 1490603721
},
subdir => {
type => 0040,
mode => 0755,
ctime => 1490603721
},
"file1.txt" => {
type => 0100,
mode => 0755,
ctime => 1490603721
}
);
sub filename_fixup {
my ($file) = shift;
$file =~ s,^/,,;
$file = '.' unless length($file);
return $file;
}
sub getdir {
my $tmp = shift;
if ($tmp eq '/') {
return (keys %files),0;
} else {
(%files) = (
'.' => {
type => 0040,
mode => 0755,
ctime => 1490603721
},
# /subdir/subdir-l2
"subdir-l2" => {
type => 0040,
mode => 0755,
ctime => 1490603721
} ,
# /subdir/a-l2.file
"file2.txt" => {
cont => "File 'al2'.\n",
type => 0100,
mode => 0755,
ctime => 1490603721
}
);
return (keys %files),0;
}
}
sub getattr {
my ($file) = filename_fixup(shift);
$file =~ s,^/,,;
$file = '.' unless length($file);
return -ENOENT() unless exists($files{$file});
my ($size) = exists($files{$file}{cont}) ? length($files{$file}{cont}) : 0;
$size = $files{$file}{size} if exists $files{$file}{size};
my ($modes) = ($files{$file}{type}<<9) + $files{$file}{mode};
my ($dev, $ino, $rdev, $blocks, $gid, $uid, $nlink, $blksize) = (0,0,0,1,0,0,1,1024);
my ($atime, $ctime, $mtime);
$atime = $ctime = $mtime = $files{$file}{ctime};
return ($dev,$ino,$modes,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks);
}
Fuse::main(
mountpoint => "/tmp/123",
getdir => \&getdir,
getattr => \&getattr,
);
one level mount fine, but if go to deeper i get
?????????? ? ? ? ? ? file2.txt
?????????? ? ? ? ? ? subdir-l2

I'm really not a regular user of the Fuse module, neither of FUSE system. Tinkered with this issue out of pure curiosity. Thus, although I can't explain in very much details how to use the plain Fuse module to achieve your goal, I have a working code that does create the wanted filesystem (at least on my system, and seems that it is capable of creating any arbitrary filesystem tree), and I can explain how I got this code working.
So first of all I discovered the Fuse::Simple module on CPAN.
Its SYNOPSIS shows that it provides a really simple API to the Fuse module for creating arbitrary filesystems from a hash structure. Its source code isn't that huge, so I just created 'listing.pl' script file and copied there most of the functions (except fserr that caused a Modification of a read-only value exception), put the main sub contents out, so they will be the main script's flow, hardcoded the filesystem structure ($fs var), and made some little adjustments here and there (like declare vars with my to prevent exceptions), and finally got the filesystem mounted, with all directories listed and files readable. So this is the code I got at last:
#!/usr/bin/env perl
use strict;
use warnings;
use diagnostics;
use Carp;
use Fuse;
use Errno qw(:POSIX); # ENOENT EISDIR etc
use Fcntl qw(:DEFAULT :mode); # S_IFREG S_IFDIR, O_SYNC O_LARGEFILE etc.
use Switch;
my $debug = 0;
my %codecache = ();
my $ctime = time();
my $uid = $>;
my $gid = $) + 0;
my $fs = {
"file1.txt" => "File 1 contents",
"subdir" => {
"subdir-l2" => {},
"file2.txt" => "File 2 contents"
}
};
# some default args
my %args = (
"mountpoint" => "listing",
"debug" => $debug,
"fuse_debug" => 0,
"threaded" => 0,
"/" => $fs
);
# the default subs
my %fs_subs = (
"chmod" => \&fs_not_imp,
"chown" => \&fs_not_imp,
"flush" => \&fs_flush,
"fsync" => \&fs_not_imp,
"getattr" => \&fs_getattr,
"getdir" => \&fs_getdir,
"getxattr" => \&fs_not_imp,
"link" => \&fs_not_imp,
"listxattr" => \&fs_not_imp,
"mkdir" => \&fs_not_imp,
"mknod" => \&fs_not_imp,
"open" => \&fs_open,
"read" => \&fs_read,
"readlink" => \&fs_readlink,
"release" => \&fs_release,
"removexattr" => \&fs_not_imp,
"rmdir" => \&fs_not_imp,
"rename" => \&fs_not_imp,
"setxattr" => \&fs_not_imp,
"statfs" => \&fs_statfs,
"symlink" => \&fs_not_imp,
"truncate" => \&fs_truncate,
"unlink" => \&fs_not_imp,
"utime" => sub{return 0},
"write" => \&fs_write,
);
# except extract these ones back out.
$debug = delete $args{"debug"};
$args{"debug"} = delete( $args{"fuse_debug"} ) || 0;
delete $args{"/"};
# add the functions, if not already defined.
# wrap in debugger if debug is set.
for my $name (keys %fs_subs) {
my $sub = $fs_subs{$name};
# $sub = wrap($sub, $name) if $debug;
$args{$name} ||= $sub;
}
Fuse::main(%args);
sub fetch {
my ($path, #args) = #_;
my $obj = $fs;
for my $elem (split '/', $path) {
next if $elem eq ""; # skip empty // and before first /
$obj = runcode($obj); # if there's anything to run
# the dir we're changing into must be a hash (dir)
return ENOTDIR() unless ref($obj) eq "HASH";
# note that ENOENT and undef are NOT the same thing!
return ENOENT() unless exists $obj->{$elem};
$obj = $obj->{$elem};
}
return runcode($obj, #args);
}
sub runcode {
my ($obj, #args) = #_;
while (ref($obj) eq "CODE") {
my $old = $obj;
if (#args) { # run with these args. don't cache
delete $codecache{$old};
print "running $obj(",quoted(#args),") NO CACHE\n" if $debug;
$obj = saferun($obj, #args);
} elsif (exists $codecache{$obj}) { # found in cache
print "got cached $obj\n" if $debug;
$obj = $codecache{$obj}; # could be undef, or an error, BTW
} else {
print "running $obj() to cache\n" if $debug;
$obj = $codecache{$old} = saferun($obj);
}
if (ref($obj) eq "NOCACHE") {
print "returned a nocache() value - flushing\n" if $debug;
delete $codecache{$old};
$obj = $$obj;
}
print "returning ",ref($obj)," ",
defined($obj) ? $obj : "undef",
"\n" if $debug;
}
return $obj;
}
sub saferun {
my ($sub, #args) = #_;
my $ret = eval { &$sub(#args) };
my $died = $#;
if (ref($died)) {
print "+++ Error $$died\n" if ref($died) eq "ERROR";
return $died;
} elsif ($died) {
print "+++ $died\n";
# stale file handle? moreorless?
return ESTALE();
}
return $ret;
}
sub nocache {
return bless(\ shift, "NOCACHE"); # yup, utter abuse of bless :-)
}
sub dump_open_flags {
my $flags = shift;
printf " flags: 0%o = (", $flags;
for my $bits (
[ O_ACCMODE(), O_RDONLY(), "O_RDONLY" ],
[ O_ACCMODE(), O_WRONLY(), "O_WRONLY" ],
[ O_ACCMODE(), O_RDWR(), "O_RDWR" ],
[ O_APPEND(), O_APPEND(), "|O_APPEND" ],
[ O_NONBLOCK(), O_NONBLOCK(), "|O_NONBLOCK" ],
[ O_SYNC(), O_SYNC(), "|O_SYNC" ],
[ O_DIRECT(), O_DIRECT(), "|O_DIRECT" ],
[ O_LARGEFILE(), O_LARGEFILE(), "|O_LARGEFILE" ],
[ O_NOFOLLOW(), O_NOFOLLOW(), "|O_NOFOLLOW" ],
) {
my ($mask, $flag, $name) = #$bits;
if (($flags & $mask) == $flag) {
$flags -= $flag;
print $name;
}
}
printf "| 0%o !!!", $flags if $flags;
print ")\n";
}
sub accessor {
my $var_ref = shift;
croak "accessor() requires a reference to a scalar var\n"
unless defined($var_ref) && ref($var_ref) eq "SCALAR";
return sub {
my $new = shift;
$$var_ref = $new if defined($new);
return $$var_ref;
}
}
sub fs_not_imp { return -ENOSYS() }
sub fs_flush {
# we're passed a path, but finding my coderef stuff from a path
# is a bit of a 'mare. flush the lot, won't hurt TOO much.
print "Flushing\n" if $debug;
%codecache = ();
return 0;
}
sub easy_getattr {
my ($mode, $size) = #_;
return (
0, 0, # $dev, $ino,
$mode,
1, # $nlink, see fuse.sourceforge.net/wiki/index.php/FAQ
$uid, $gid, # $uid, $gid,
0, # $rdev,
$size, # $size,
$ctime, $ctime, $ctime, # actually $atime, $mtime, $ctime,
1024, 1, # $blksize, $blocks,
);
}
sub fs_getattr {
my $path = shift;
my $obj = fetch($path);
# undef doesn't actually mean "file not found", it could be a coderef
# file-sub which has returned undef.
return easy_getattr(S_IFREG | 0200, 0) unless defined($obj);
switch (ref($obj)) {
case "ERROR" { # this is an error to be returned.
return -$$obj;
}
case "" { # this isn't a ref, it's a real string "file"
return easy_getattr(S_IFREG | 0644, length($obj));
}
# case "CODE" should never happen - already been run by fetch()
case "HASH" { # this is a directory hash
return easy_getattr(S_IFDIR | 0755, 1);
}
case "SCALAR" { # this is a scalar ref. we use these for symlinks.
return easy_getattr(S_IFLNK | 0777, 1);
}
else { # what the hell is this file?!?
print "+++ What on earth is ",ref($obj)," $path ?\n";
return easy_getattr(S_IFREG | 0000, 0);
}
}
}
sub fs_getdir {
my $obj = fetch(shift);
return -$$obj if ref($obj) eq "ERROR"; # THINK this is a good idea.
return -ENOENT() unless ref($obj) eq "HASH";
return (".", "..", sort(keys %$obj), 0);
}
sub fs_open {
# doesn't really need to open, just needs to check.
my $obj = fetch(shift);
my $flags = shift;
dump_open_flags($flags) if $debug;
# if it's undefined, and we're not writing to it, return an error
return -EBADF() unless defined($obj) or ($flags & O_ACCMODE());
switch (ref($obj)) {
case "ERROR" { return -$$obj; }
case "" { return 0 } # this is a real string "file"
case "HASH" { return -EISDIR(); } # this is a directory hash
else { return -ENOSYS(); } # what the hell is this file?!?
}
}
sub fs_read {
my $obj = fetch(shift);
my $size = shift;
my $off = shift;
return -ENOENT() unless defined($obj);
return -$$obj if ref($obj) eq "ERROR";
# any other types of refs are probably bad
return -ENOENT() if ref($obj);
if ($off > length($obj)) {
return -EINVAL();
} elsif ($off == length($obj)) {
return 0; # EOF
}
return substr($obj, $off, $size);
}
sub fs_readlink {
my $obj = fetch(shift);
return -$$obj if ref($obj) eq "ERROR";
return -EINVAL() unless ref($obj) eq "SCALAR";
return $$obj;
}
sub fs_release {
my ($path, $flags) = #_;
dump_open_flags($flags) if $debug;
return 0;
}
sub fs_statfs {
return (
255, # $namelen,
1,1, # $files, $files_free,
1,1, # $blocks, $blocks_avail, # 0,0 seems to hide it from df?
2, # $blocksize,
);
}
sub fs_truncate {
my $obj = fetch(shift, ""); # run anything to set it to ""
return -$$obj if ref($obj) eq "ERROR";
return 0;
}
sub fs_write {
my ($path, $buf, $off) = #_;
my $obj = fetch($path, $buf, $off); # this runs the coderefs!
return -$$obj if ref($obj) eq "ERROR";
return length($buf);
}
Final word: I didn't try to use the module itself (it's not listed in my distro package repository, and I was too lazy (sorry) to install it by cpanm or other way). But I think that if I'll have to just use FUSE with Perl, I'll probably just use Fuse::Simple instead of Fuse, maybe forking it. I'd use plain Fuse only for my academic research, I think...
Hope this helps.

GetOption - Perl - Referencing

So I have stumbled upon a little issue when trying to build out a simple "Airport Search Script" in Perl.
my $filename = '/home/student/perl-basic/topic-07/iata_airports.csv';
my $number = '1';
my $matching;
my $latitude;
my $longitude;
my $word = 'false';
GetOptions (
"filename=s" => \$filename,
"number=i" => \$number,
"matching=s" => \$matching,
"latitude=f" => \$latitude,
"longitude=f" => \$longitude,
"word=s" => \$word
);
sub parse_airports {
my $file = shift;
my $csv = Text::CSV->new( { binary => 1, eol => $/ } );
open ( my $fh, "<", $file ), or die "Error opening input file: $!";
my $ra_colnames = $csv->getline ( $fh );
$csv->column_names( #$ra_colnames );
my $ra_airports = $csv->getline_hr_all( $fh );
close ( $fh );
return $ra_airports;
}
sub get_name_matching_airports {
}
my $rah_airports = parse_airports( $filename );
my $rah_airports_found = [];
if ($matching) {
say "Up to $number airports matching $matching in $filename:";
$rah_airports_found = get_name_matching_airports(
airports => $rah_airports,
matching_string => $matching,
word => $word,
);
}
elsif ($latitude && $longitude) {
say "Up to $number airports near [$latitude, $longitude] in $filename:"
}
else {
say "Must have at least --matching, or --latitude and --longitude as arguments";
}
print pp($rah_airports_found);
So where I am struggling is in the "sub get_name_matching_airports"
Because you do not have the file let me explain the file structure.
It is a hash (ALL IATA Airports) with hashes (DETAILS of each airport). There are around 15 keys in each airport hash and one of the keys titles is (NAME). I have opened the file and parsed all the info into a hash ref which is returned at the end of the sub "parse_airports".
In the sub "get_name_matching_airports" I need to find additional airports with similar names based on the argument I passed in, into ($matching).
EXAMPLE: I parse (case-insensitive) "London" as an argument from the command line e.g. ./search_airports2 --matching London. In the sub "get_name_matching_airports" I will need to respond with any airport that has london (case-insensitive) in key(name).
Then push these newly found airports which are similar into the array "rah_airports_found" and in the end print this out.
SO I SOLVED MY PROBLEM WITH THE FOLLOWING CODE:
sub get_name_matching_airports {
my %params = (
airports => undef,
matching_string => undef,
word => undef,
#_
);
my #rah_airports_found;
my $ra_airports = $params{airports};
my $counter = 0;
foreach my $i ( #$ra_airports ) {
if ( $params{word} ) {
if ( $i->{name} eq $params{matching_string} ) {
push #rah_airports_found, $i;
$counter++;
}
}
else {
if ( $i->{name} =~ /$params{matching_string}/i ) {
push #rah_airports_found, $i;
$counter++;
}
if ( defined( $number ) && $counter == $number ) {
return \#rah_airports_found;
}
}
}
return \#rah_airports_found;
}

Example:
for my $Airport_rf (keys %{$rah_airports}) {
if ( $Airport_rf->{NAME} =~ m{\Q$matching\E}xi) {
# do your stuff here
}
}
If you don´t know the exact key of the hashref, you have to match the CLI parameter against all values.

what is the correct loop/conditional option for not finding a variable?

I am searching through three text files for one of four specific gene names (stored in $var#). When it is found, it takes the value found after the gene name and adds it to a count. We then average the value by taking total $count_exp# and dividing by the number of appearances within all files.
What is the proper way to let the user know when a gene name is not found in each file? I'm having difficulties handling the flow of this loop/conditional.
Here is a snippet of code that handles one of the three text files....
foreach $hyperosmotic(#hyperosmotic)
{
#hyperosmotic1=split(/\t/,$hyperosmotic);
$name=$hyperosmotic1[0];
$exp=$hyperosmotic1[1];
chomp $name;
chomp $exp;
if ($name eq $var1)
{
$count_exp1 = $count_exp1 + $exp;
$count_var1 = ++$count_var1;
}
elsif ($name eq $var2)
{
$count_exp2 = $count_exp2 + $exp;
$count_var2 = ++$count_var2;
}
elsif ($name eq $var3)
{
$count_exp3 = $count_exp3 + $exp;
$count_var3 = ++$count_var3;
}
elsif ($name eq $var4)
{
$count_exp4 = $count_exp4 + $exp;
$count_var4 = ++$count_var4;
}
}

You basically want to use arrays:
(and use strict; use warnings;)
my #count_var = (0)x4;
my #count_exp = (0)x4;
my #var = ($var1, $var2, ...);
HYPEROSMOTIC:
for my $hyperosmotic (#hyperosmotic) {
my ($name, $exp) = split /\t/, $hyperosmotic;
for my $i (0 .. $#var) {
if ($name eq $var[$i]) {
$count_exp[$i] += $exp;
$count_var[$i]++;
next HYPEROSMOTIC; # jump into next iteration of the labeled loop
}
}
# this code is only reached if no var matched:
die qq[I don't have a var for name "$name"];
# That just threw a fatal error. You may want to do something different.
}
You could improve efficiency by using hashes:
my %counts = (
$var1 => {exp => 0, var => 0},
$var2 => {exp => 0, var => 0},
$var3 => {exp => 0, var => 0},
$var4 => {exp => 0, var => 0},
);
for my $hyperosmotic (#hyperosmotic) {
my ($name, $exp) = split ...;
if (my $count = $counts{$name}) {
$count->{exp} += $exp;
$count->{var}++;
} else {
die qq[I don't have a var for name "$name"];
}
}

DNS Resolver result from multiple nameservers

I am passing two name servers to the Net::DNS::Resolver constructor but I am getting only one result back.
How should I change the code to receive result from all the name servers?
sub resolve_dns()
{
my $dns = $_[0];
my $res = Net::DNS::Resolver->new(
nameservers => [qw(24.116.197.232 114.130.11.67 )],
recurse => 0,
debug => 1,
tcp_timeout => 3
);
my $query = $res->search($dns);
if ($query) {
foreach my $rr ($query->answer) {
next unless $rr->type eq "A";
print $rr->address, "\n";
}
} else {
warn "query failed: ", $res->errorstring, "\n";
}
}

I presume the DNS servers after the first are there for fallback purposes and only a single reply will ever be returned.
The best way seems to be to manipulate the Net::DNS::Resolver server list and explicitly make a request to each of them.
This example code demonstrates the principle
sub resolve_dns {
my $address = shift;
my $res = Net::DNS::Resolver->new
recurse => 0,
debug => 1,
tcp_timeout => 3,
);
for my $ns (qw( 24.116.197.232 114.130.11.67 )) {
$res->nameservers($ns);
my $reply = $res->send($address);
if ($reply) {
my #type_a = grep $_->type eq 'A', $reply->answer;
print $_->address, "\n" for #type_a;
}
else {
warn sprintf "Query to %s failed: %s\n", $ns, $res->errorstring;
}
}
}

Perl Working On Two Hash References

I would like to compare the values of two hash references.
The data dumper of my first hash is this:
$VAR1 = {
'42-MG-BA' => [
{
'chromosome' => '19',
'position' => '35770059',
'genotype' => 'TC'
},
{
'chromosome' => '2',
'position' => '68019584',
'genotype' => 'G'
},
{
'chromosome' => '16',
'position' => '9561557',
'genotype' => 'G'
},
And the second hash is similar to this but with more hashes in the array. I would like to compare the genotype of my first and second hash if the position and the choromosome matches.
map {print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n"}sort keys %$cave_snp_list;
map {print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n"}sort keys %$geno_seq_list;
I could do that for the first array of the hashes.
Could you help me in how to work for all the arrays?
This is my actual code in full
#!/software/bin/perl
use strict;
use warnings;
use Getopt::Long;
use Benchmark;
use Config::Config qw(Sequenom.ini);
useDatabase::Conn;
use Data::Dumper;
GetOptions("sam=s" => \my $sample);
my $geno_seq_list = getseqgenotypes($sample);
my $cave_snp_list = getcavemansnpfile($sample);
#print Dumper($geno_seq_list);
print scalar %$geno_seq_list, "\n";
foreach my $sam (keys %{$geno_seq_list}) {
my $seq_used = $geno_seq_list->{$sam};
my $cave_used = $cave_snp_list->{$sam};
print scalar(#$geno_seq_list->{$_}) if sort keys %$geno_seq_list, "\n";
print scalar(#$cave_used), "\n";
#foreach my $seq2com (# {$seq_used } ){
# foreach my $cave2com( # {$cave_used} ){
# print $seq2com->{chromosome},":" ,$cave2com->{chromosome},"\n";
# }
#}
map { print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n" } sort keys %$cave_snp_list;
map { print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n" } sort keys %$geno_seq_list;
}
sub getseqgenotypes {
my $snpconn;
my $gen_list = {};
$snpconn = Database::Conn->new('live');
$snpconn->addConnection(DBI->connect('dbi:Oracle:pssd.world', 'sn', 'ss', { RaiseError => 1, AutoCommit => 0 }),
'pssd');
#my $conn2 =Database::Conn->new('live');
#$conn2->addConnection(DBI->connect('dbi:Oracle:COSI.world','nst_owner','nst_owner', {RaiseError =>1 , AutoCommit=>0}),'nst');
my $id_ind = $snpconn->execute('snp::Sequenom::getIdIndforExomeSample', $sample);
my $genotype = $snpconn->executeArrRef('snp::Sequenom::getGenotypeCallsPosition', $id_ind);
foreach my $geno (#{$genotype}) {
push #{ $gen_list->{ $geno->[1] } }, {
chromosome => $geno->[2],
position => $geno->[3],
genotype => $geno->[4],
};
}
return ($gen_list);
} #end of sub getseqgenotypes
sub getcavemansnpfile {
my $nstconn;
my $caveman_list = {};
$nstconn = Database::Conn->new('live');
$nstconn->addConnection(
DBI->connect('dbi:Oracle:CANP.world', 'nst_owner', 'NST_OWNER', { RaiseError => 1, AutoCommit => 0 }), 'nst');
my $id_sample = $nstconn->execute('nst::Caveman::getSampleid', $sample);
#print "IDSample: $id_sample\n";
my $file_location = $nstconn->execute('nst::Caveman::getCaveManSNPSFile', $id_sample);
open(SNPFILE, "<$file_location") || die "Error: Cannot open the file $file_location:$!\n";
while (<SNPFILE>) {
chomp;
next if /^>/;
my #data = split;
my ($nor_geno, $tumor_geno) = split /\//, $data[5];
# array of hash
push #{ $caveman_list->{$sample} }, {
chromosome => $data[0],
position => $data[1],
genotype => $nor_geno,
};
} #end of while loop
close(SNPFILE);
return ($caveman_list);
}

The problem that I see is that you're constructing a tree for generic storage of data, when what you want is a graph, specific to the task. While you are constructing the record, you could also be constructing the part that groups data together. Below is just one example.
my %genotype_for;
my $record
= { chromosome => $data[0]
, position => $data[1]
, genotype => $nor_geno
};
push #{ $gen_list->{ $geno->[1] } }, $record;
# $genotype_for{ position }{ chromosome }{ name of array } = genotype code
$genotype_for{ $data[1] }{ $data[0] }{ $sample } = $nor_geno;
...
return ( $caveman_list, \%genotype_for );
In the main line, you receive them like so:
my ( $cave_snp_list, $geno_lookup ) = getcavemansnpfile( $sample );
This approach at least allows you to locate similar position and chromosome values. If you're going to do much with this, I might suggest an OO approach.
Update
Assuming that you wouldn't have to store the label, we could change the lookup to
$genotype_for{ $data[1] }{ $data[0] } = $nor_geno;
And then the comparison could be written:
foreach my $pos ( keys %$small_lookup ) {
next unless _HASH( my $sh = $small_lookup->{ $pos } )
and _HASH( my $lh = $large_lookup->{ $pos } )
;
foreach my $chrom ( keys %$sh ) {
next unless my $sc = $sh->{ $chrom }
and my $lc = $lh->{ $chrom }
;
print "$sc:$sc";
}
}
However, if you had limited use for the larger list, you could construct the specific case
and pass that in as a filter when creating the longer list.
Thus, in whichever loop creates the longer list, you could just go
...
next unless $sample{ $position }{ $chromosome };
my $record
= { chromosome => $chromosome
, position => $position
, genotype => $genotype
};
...

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl LDAP search - over 1500 member in a group - perl

Related

Virtual Filesystem in Perl with Fuse

GetOption - Perl - Referencing

what is the correct loop/conditional option for not finding a variable?

DNS Resolver result from multiple nameservers

Perl Working On Two Hash References

Categories

Resources