Why some entries are missing in Perl ENV hash - perl

Perl is giving me an undef value when I access a variable that is supposed to be defined in the %ENV hash. How is this possible?
root#23cd5f45def7:~/bin$ perl -e 'warn $ENV{SHELL}'
Warning: something's wrong at -e line 1.
I would expect perl to output /bin/bash instead.
More info on the environment:
root#23cd5f45def7:~/bin$ echo $SHELL
/bin/bash
root#23cd5f45def7:~/bin$ $SHELL --version
GNU bash, version 4.2.37(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2011 Free Software Foundation, Inc.
...
root#23cd5f45def7:~/bin$ perl -v
This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi
(with 91 registered patches, see perl -V for more detail)
...
I am running this on Debian wheezy in a docker container.
The image was created with
sudo debootstrap wheezy ../_build http://ftp.us.debian.org/debian
sudo tar -C ../_build -c . | docker import - wheezy/bootstrap
I get the same beheviour with perl-5.24.1 compiled manually from sources.

The error message
Warning: something's wrong at -e line 1.
indicates that the environment variable $SHELL does not exist or is not exported.
You can list the exported variables using the export command. You can add SHELL to the exported variables using the command:
export SHELL

Something is wrong with your installation. It looks like the string
Warning: something's wrong at -e line 1.
would only ever be produced by the following code on line 461 in pp_sys.c (that is, when the argument to warn is undefined). Therefore, I am going to deduce that something sanitizes the environment before perl is invoked. You might also want to examine root's .profile, .bashrc, .bash_profile and other possibly relevant configuration files.
421 PP(pp_warn)
422 {
423 dSP; dMARK;
424 SV *exsv;
425 STRLEN len;
426 if (SP - MARK > 1) {
427 dTARGET;
428 do_join(TARG, &PL_sv_no, MARK, SP);
429 exsv = TARG;
430 SP = MARK + 1;
431 }
432 else if (SP == MARK) {
433 exsv = &PL_sv_no;
434 EXTEND(SP, 1);
435 SP = MARK + 1;
436 }
437 else {
438 exsv = TOPs;
439 if (SvGMAGICAL(exsv)) exsv = sv_mortalcopy(exsv);
440 }
441
442 if (SvROK(exsv) || (SvPV_const(exsv, len), len)) {
443 /* well-formed exception supplied */
444 }
445 else {
446 SV * const errsv = ERRSV;
447 SvGETMAGIC(errsv);
448 if (SvROK(errsv)) {
449 if (SvGMAGICAL(errsv)) {
450 exsv = sv_newmortal();
451 sv_setsv_nomg(exsv, errsv);
452 }
453 else exsv = errsv;
454 }
455 else if (SvPOKp(errsv) ? SvCUR(errsv) : SvNIOKp(errsv)) {
456 exsv = sv_newmortal();
457 sv_setsv_nomg(exsv, errsv);
458 sv_catpvs(exsv, "\t...caught");
459 }
460 else {
461 exsv = newSVpvs_flags("Warning: something's wrong", SVs_TEMP); ## <-- Here ...
462 }
463 }
464 if (SvROK(exsv) && !PL_warnhook)
465 Perl_warn(aTHX_ "%" SVf, SVfARG(exsv));
466 else warn_sv(exsv);
467 RETSETYES;
468 }

Related

Perl Net::Server Log Buffer cuts off at 4096 characters

I am using Perl Net::Server and using the built in log method like
$self->log( 1, lc( $json->encode($callInfo) ) );
The issue that I am having is sometimes the data in $callInfo is larger than 4096 characters and it writes 4096 characters to the log file and then a child writes to the log with it's $callInfo and then the rest of the original $callInfo gets logged.
Example:
Assuming abcdef is over 4096 characters.
callinfo1 -> child process tries to write 'abcdef' to log where 'abc' would be written and interrupted by the next child process
callinfo2 -> another child process writes to the log and then the remaining data 'def' from callinfo1 would get written.
I have tried adding the following and change the buffer size to 8192 but the issue remains.
sub post_configure {
94 my $self = shift;
95 my $prop = $self->{server};
96 $prop->{log_level} = 1;
97
98 if( $prop->{log_file} ){
99 local $/ = 8192;
100 open(_SERVER_LOG, ">>$prop->{log_file}") or die "Couldn't open log file \"$prop->{log_file}\" [$!].";
101 _SERVER_LOG->autoflush(1);
102 #open(our $logHandler, '>>', $prop->{log_file});
103 #$logHandler->autoflush(1);
104 $prop->{chown_log_file} = 1;
105 }
106 }
107
108 sub log {
109 my $self = shift;
110 my $prop = $self->{server};
111 my $level = shift;
112 $self->write_to_log_hook($level,#_);
113 }
114
115
116 sub write_to_log_hook {
117 my $self = shift;
118 my $prop = $self->{server};
119 my $level = shift;
120 local $_ = shift || '';
121 chomp;
122 s/([^\n\ -\~])/sprintf("%%%02X",ord($1))/eg;
123
124 if( $prop->{log_file} ){
125 #if(substr($_, 0, 1) eq '{')
126 #{
127 print _SERVER_LOG $_, "\n";
128 #print $logHandler $_, "\n";
129 #}
130 }
131 }
Any ideas on how to get the log buffer to finish before another child process logs?
Thanks in advance.
Perl doesn't send everything to OS at once, even with autoflush, so it's possible for a print to be interrupted by other processes.
$ strace perl -e'STDOUT->autoflush; print "x" x 9999' 2>&1 >/dev/null | grep write
write(1, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192
write(1, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 1807) = 1807
That said, the OS only guarantees that writes under a certain size are atomic, so large prints could still be interrupted even if Perl sent everything to the OS at once.
This means it's up to the processes to synchronize themselves using some form of mutual exclusion (e.g. by using a lock).

Modifying perl script to not print duplicates and extract sequences of a certain length

I want to first apologize for the biological nature of this post. I thought I should post some background first. I have a set of gene files that contain anywhere from one to five DNA sequences from different species. I used a bash shell script to perform blastn with each gene file as a query and a file of all transcriptome sequences (all_transcriptome_seq.fasta) from the five species as the subject. I now want to process these output files (and there are many) so that I can get all subject sequences that hit into one file per gene, with duplicate sequences removed (except to keep one), and ensure I'm getting the length of the sequences that actually hit the query.
Here is what the blastn output looks like for one gene file (columns: qseqid qlen sseqid slen qframe qstart qend sframe sstart send evalue bitscore pident nident length)
Acur_01000750.1_OFAS014956-RA-EXON04 248 Apil_comp17195_c0_seq1 1184 1 1 248 1 824 1072 2e-73 259 85.60 214 250
Acur_01000750.1_OFAS014956-RA-EXON04 248 Atri_comp5613_c0_seq1 1067 1 2 248 1 344 96 8e-97 337 91.16 227 249
Acur_01000750.1_OFAS014956-RA-EXON04 248 Acur_01000750.1 992 1 1 248 1 655 902 1e-133 459 100.00 248 248
Acur_01000750.1_OFAS014956-RA-EXON04 248 Btri_comp17734_c0_seq1 1001 1 1 248 1 656 905 5e-69 244 84.40 211 250
Btri_comp17734_c0_seq1_OFAS014956-RA-EXON04 250 Atri_comp5613_c0_seq1 1067 1 2 250 1 344 96 1e-60 217 82.33 205 249
Btri_comp17734_c0_seq1_OFAS014956-RA-EXON04 250 Acur_01000750.1 992 1 1 250 1 655 902 5e-69 244 84.40 211 250
Btri_comp17734_c0_seq1_OFAS014956-RA-EXON04 250 Btri_comp17734_c0_seq1 1001 1 1 250 1 656 905 1e-134 462 100.00 250 250
I've been working on a perl script that would, in short, take the sseqid column to pull out the corresponding sequences from the all_transcriptome_seq.fasta file, place these into a new file, and trim the transcripts to the sstart and send positions. Here is the script, so far:
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dumper;
############################################################################
# blastn_post-processing.pl v. 1.0 by Michael F., XXXXXX
############################################################################
my($progname) = $0;
############################################################################
# Initialize variables
############################################################################
my($jter);
my($com);
my($t1);
if ( #ARGV != 2 ) {
print "Usage:\n \$ $progname <infile> <transcriptomes>\n";
print " infile = tab-delimited blastn text file\n";
print " transcriptomes = fasta file of all transcriptomes\n";
print "exiting...\n";
exit;
}
my($infile)=$ARGV[0];
my($transcriptomes)=$ARGV[1];
############################################################################
# Read the input file
############################################################################
print "Reading the input file... ";
open (my $INF, $infile) or die "Unable to open file";
my #data = <$INF>;
print #data;
close($INF) or die "Could not close file $infile.\n";
my($nlines) = $#data + 1;
my($inlines) = $nlines - 1;
print "$nlines blastn hits read\n\n";
############################################################################
# Extract hits and place sequences into new file
############################################################################
my #temparray;
my #templine;
my($seqfname);
open ($INF, $infile) or die "Could not open file $infile for input.\n";
#temparray = <$INF>;
close($INF) or die "Could not close file $infile.\n";
$t1 = $#temparray + 1;
print "$infile\t$t1\n";
$seqfname = "$infile" . ".fasta";
if ( -e $seqfname ) {
print " --> $seqfname exists. overwriting\n";
unlink($seqfname);
}
# iterate through the individual hits
for ($jter=0; $jter<$t1; $jter++) {
(#templine) = split(/\s+/, $temparray[$jter]);
$com = "./extract_from_genome2 $transcriptomes $templine[2] $templine[8] $templine[9] $templine[2]";
# print "$com\n";
system("$com");
system("cat temp.3 >> $seqfname");
} # end for ($jter=0; $jter<$t1...
# Arguments for "extract_from_genome2"
# // argv[1] = name of genome file
# // argv[2] = gi number for contig
# // argv[3] = start of subsequence
# // argv[4] = end of subsequence
# // argv[5] = name of output sequence
Using this script, here is the output I'm getting:
>Apil_comp17195_c0_seq1
GATTCTTGCATCTGCAGTAAGACCAGAAATGCTCATTCCTATATGGCTATCTAATGGTATTATTTTTTTCTGATGTGCTGATAATTCAGACGAAGCTCTTTTAAGAGCCACAAGAACTGCATACTGCTTGTTTTTTACTCCAACAGTAGCAGCTCCCAGTTTTACAGCTTCCATTGCATATTCGACTTGGTGCAGGCGTCCCTGGGGACTCCAGACGGTAACGTCAGAATCATACTGGTTACGGAACA
>Atri_comp5613_c0_seq1
GAGAATTCTAGCATCAGCAGTGAGGCCTGAAATACTCATGCCTATGTGACTATCTAGAGGTATTATTTTTTTTTGATGAGCTGACAGTTCAGAAGAAGCTCTTTTGAGAGCTACAAGAACTGCATACTGTTTATTTTTTACTCCAACTGTTGCTGCTCCAAGCTTTACAGCCTCCATTGCATATTCCACTTGGTGTAAACGCCCCTGAGGACTCCATACCGTAACATCAGAATCATACTGATTACGGA
>Acur_01000750.1
GAATTCTAGCGTCAGCAGTGAGTCCTGAAATACTCATCCCTATGTGGCTATCTAGAGGTATTATTTTTTCTGATGGGCCGACAGTTCAGAGGATGCTCTTTTAAGAGCCACAAGAACTGCATACTCTTTATTTTTACTCCAACAGTAGCAGCTCCAAGCTTCACAGCCTCCATTGCATATTCCACCTGGTGTAAACGTCCCTGAGGGCTCCATACCGTAACATCAGAATCATACTGGTTACGGAACA
>Btri_comp17734_c0_seq1
GAATCCTTGCATCTGCAGTAAGTCCAGAAATGCTCATTCCAATATGGCTATCTAATGGTATTATTTTTTTCTGGTGAGCAGACAATTCAGATGATGCTCTTTTAAGAGCTACCAGTACTGCAAAATCATTGTTCTTCACTCCAACAGTTGCAGCACCTAATTTGACTGCCTCCATTGCATACTCCACTTGGTGCAATCTTCCCTGAGGGCTCCATACCGTAACATCAGAATCATACTGGTTACGGAACA
>Atri_comp5613_c0_seq1
GAGAATTCTAGCATCAGCAGTGAGGCCTGAAATACTCATGCCTATGTGACTATCTAGAGGTATTATTTTTTTTTGATGAGCTGACAGTTCAGAAGAAGCTCTTTTGAGAGCTACAAGAACTGCATACTGTTTATTTTTTACTCCAACTGTTGCTGCTCCAAGCTTTACAGCCTCCATTGCATATTCCACTTGGTGTAAACGCCCCTGAGGACTCCATACCGTAACATCAGAATCATACTGATTACGGA
>Acur_01000750.1
GAATTCTAGCGTCAGCAGTGAGTCCTGAAATACTCATCCCTATGTGGCTATCTAGAGGTATTATTTTTTCTGATGGGCCGACAGTTCAGAGGATGCTCTTTTAAGAGCCACAAGAACTGCATACTCTTTATTTTTACTCCAACAGTAGCAGCTCCAAGCTTCACAGCCTCCATTGCATATTCCACCTGGTGTAAACGTCCCTGAGGGCTCCATACCGTAACATCAGAATCATACTGGTTACGGAACA
>Btri_comp17734_c0_seq1
GAATCCTTGCATCTGCAGTAAGTCCAGAAATGCTCATTCCAATATGGCTATCTAATGGTATTATTTTTTTCTGGTGAGCAGACAATTCAGATGATGCTCTTTTAAGAGCTACCAGTACTGCAAAATCATTGTTCTTCACTCCAACAGTTGCAGCACCTAATTTGACTGCCTCCATTGCATACTCCACTTGGTGCAATCTTCCCTGAGGGCTCCATACCGTAACATCAGAATCATACTGGTTACGGAACA
As you can see, it's pretty close to what I'm wanting. Here are the two issues I have and cannot seem to figure out how to resolve with my script. The first is that a sequence may occur more than once in the sseqid column, and with the script in its current form, it will print out duplicates of these sequences. I only need one. How can I modify my script to not duplicate sequences (i.e., how do I only retain one but remove the other duplicates)? Expected output:
>Apil_comp17195_c0_seq1
GATTCTTGCATCTGCAGTAAGACCAGAAATGCTCATTCCTATATGGCTATCTAATGGTATTATTTTTTTCTGATGTGCTGATAATTCAGACGAAGCTCTTTTAAGAGCCACAAGAACTGCATACTGCTTGTTTTTTACTCCAACAGTAGCAGCTCCCAGTTTTACAGCTTCCATTGCATATTCGACTTGGTGCAGGCGTCCCTGGGGACTCCAGACGGTAACGTCAGAATCATACTGGTTACGGAACA
>Atri_comp5613_c0_seq1
GAGAATTCTAGCATCAGCAGTGAGGCCTGAAATACTCATGCCTATGTGACTATCTAGAGGTATTATTTTTTTTTGATGAGCTGACAGTTCAGAAGAAGCTCTTTTGAGAGCTACAAGAACTGCATACTGTTTATTTTTTACTCCAACTGTTGCTGCTCCAAGCTTTACAGCCTCCATTGCATATTCCACTTGGTGTAAACGCCCCTGAGGACTCCATACCGTAACATCAGAATCATACTGATTACGGA
>Acur_01000750.1
GAATTCTAGCGTCAGCAGTGAGTCCTGAAATACTCATCCCTATGTGGCTATCTAGAGGTATTATTTTTTCTGATGGGCCGACAGTTCAGAGGATGCTCTTTTAAGAGCCACAAGAACTGCATACTCTTTATTTTTACTCCAACAGTAGCAGCTCCAAGCTTCACAGCCTCCATTGCATATTCCACCTGGTGTAAACGTCCCTGAGGGCTCCATACCGTAACATCAGAATCATACTGGTTACGGAACA
>Btri_comp17734_c0_seq1
GAATCCTTGCATCTGCAGTAAGTCCAGAAATGCTCATTCCAATATGGCTATCTAATGGTATTATTTTTTTCTGGTGAGCAGACAATTCAGATGATGCTCTTTTAAGAGCTACCAGTACTGCAAAATCATTGTTCTTCACTCCAACAGTTGCAGCACCTAATTTGACTGCCTCCATTGCATACTCCACTTGGTGCAATCTTCCCTGAGGGCTCCATACCGTAACATCAGAATCATACTGGTTACGGAACA
The second is the script is not quite extracting the right base pairs. It's super close, off by one or two, but its not exact.
For example, take the first subject hit Apil_comp17195_c0_seq1. The sstart and send values are 824 and 1072, respectively. When I go to the all_transcriptome_seq.fasta, I get
AAGATTCTTGCATCTGCAGTAAGACCAGAAATGCTCATTCCTATATGGCTATCTAATGGTATTATTTTTTTCTGATGTGCTGATAATTCAGACGAAGCTCTTTTAAGAGCCACAAGAACTGCATACTGCTTGTTTTTTACTCCAACAGTAGCAGCTCCCAGTTTTACAGCTTCCATTGCATATTCGACTTGGTGCAGGCGTCCCTGGGGACTCCAGACGGTAACGTCAGAATCATACTGGTTACGGAAC
at that base pair range, not
GATTCTTGCATCTGCAGTAAGACCAGAAATGCTCATTCCTATATGGCTATCTAATGGTATTATTTTTTTCTGATGTGCTGATAATTCAGACGAAGCTCTTTTAAGAGCCACAAGAACTGCATACTGCTTGTTTTTTACTCCAACAGTAGCAGCTCCCAGTTTTACAGCTTCCATTGCATATTCGACTTGGTGCAGGCGTCCCTGGGGACTCCAGACGGTAACGTCAGAATCATACTGGTTACGGAACA
as outputted by my script, which is what I'm expecting. You will also notice that the sequence outputted by my script is slightly shorter than it should be. Does anyone know how I can fix these issues in my script?
Thanks, and sorry for the lengthy post!
Edit 1: a solution was offered that work for some of the infiles. However, some were causing the script to output fewer sequences than expected. Here is one such infile with 9 hits, from which I was expecting only 4 sequences.
Note: this issue has been largely resolved based on the solution provided below the answer section
Apil_comp16418_c0_seq1_OFAS000119-RA-EXON01 1587 Apil_comp16418_c0_seq1 2079 1 1 1587 1 416 2002 0.0 2931 100.00 1587 1587
Apil_comp16418_c0_seq1_OFAS000119-RA-EXON01 1587 Atri_comp13712_c0_seq1 1938 1 1 1587 1 1651 75 0.0 1221 80.73 1286 1593
Apil_comp16418_c0_seq1_OFAS000119-RA-EXON01 1587 Ctom_01003023.1 2162 1 1 1406 1 1403 1 0.0 1430 85.07 1197 1407
Atri_comp13712_c0_seq1_OFAS000119-RA-EXON01 1441 Apil_comp16418_c0_seq1 2079 1 1 1437 1 1866 430 0.0 1170 81.43 1175 1443
Atri_comp13712_c0_seq1_OFAS000119-RA-EXON01 1441 Atri_comp13712_c0_seq1 1938 1 1 1441 1 201 1641 0.0 2662 100.00 1441 1441
Atri_comp13712_c0_seq1_OFAS000119-RA-EXON01 1441 Acur_01000228.1 2415 1 1 1440 1 2231 797 0.0 1906 90.62 1305 1440
Ctom_01003023.1_OFAS000119-RA-EXON01 1289 Apil_comp16418_c0_seq1 2079 1 3 1284 1 1714 430 0.0 1351 85.69 1102 1286
Ctom_01003023.1_OFAS000119-RA-EXON01 1289 Acur_01000228.1 2415 1 1 1287 1 2084 797 0.0 1219 83.81 1082 1291
Ctom_01003023.1_OFAS000119-RA-EXON01 1289 Ctom_01003023.1 2162 1 1 1289 1 106 1394 0.0 2381 100.00 1289 1289
Edit 2: There is still an occasional output lacking fewer sequences than expected, although not as many after incorporating modifications to my script from Edit 1 suggestion (i.e., accounting for reverse direction). I cannot figure out why the script would be outputting fewer sequences in these other cases. Below the infile in question. The output is lacking Btri_comp15171_c0_seq1:
Apil_comp19456_c0_seq1_OFAS000248-RA-EXON07 2464 Apil_comp19456_c0_seq1 3549 1 1 2464 1 761 3224 0.0 4551 100.00 2464 2464
Apil_comp19456_c0_seq1_OFAS000248-RA-EXON07 2464 Btri_comp15171_c0_seq1 3766 1 1 2456 1 3046 591 0.0 1877 80.53 1985 2465
Btri_comp15171_c0_seq1_OFAS000248-RA-EXON07 2457 Apil_comp19456_c0_seq1 3549 1 1 2457 1 3214 758 0.0 1879 80.54 1986 2466
Btri_comp15171_c0_seq1_OFAS000248-RA-EXON07 2457 Atri_comp28646_c0_seq1 1403 1 1256 2454 1 1401 203 0.0 990 81.60 980 1201
Btri_comp15171_c0_seq1_OFAS000248-RA-EXON07 2457 Btri_comp15171_c0_seq1 3766 1 1 2457 1 593 3049 0.0 4538 100.00 2457 2457
You can use hash to remove duplicates
The bellow code remove duplicates depending on their subject length (keep larger subject length rows).
Just update your # iterate through the individual hits part with
# iterate through the individual hits
my %filterhash;
my $subject_length;
for ($jter=0; $jter<$t1; $jter++) {
(#templine) = split(/\s+/, $temparray[$jter]);
$subject_length = $templine[9] -$templine[8];
if(exists $filterhash{$templine[2]} ){
if($filterhash{$templine[2]} < $subject_length){
$filterhash{$templine[2]}= $subject_length;
}
}
else{
$filterhash{$templine[2]}= $subject_length;
}
}
my %printhash;
for ($jter=0; $jter<$t1; $jter++) {
(#templine) = split(/\s+/, $temparray[$jter]);
$subject_length = $templine[9] -$templine[8];
if(not exists $printhash{$templine[2]})
{
$printhash{$templine[2]}=1;
if(exists $filterhash{$templine[2]} and $filterhash{$templine[2]} == $subject_length ){
$com = "./extract_from_genome2 $transcriptomes $templine[2] $templine[8] $templine[9] $templine[2]";
# print "$com\n";
system("$com");
system("cat temp.3 >> $seqfname");
}
}
else{
if(exists $filterhash{$templine[2]} and $filterhash{$templine[2]} == $subject_length ){
$com = "./extract_from_genome2 $transcriptomes $templine[2] $templine[8] $templine[9] $templine[2]";
#print "$com\n";
system("$com");
system("cat temp.3 >> $seqfname");
}
}
} # end for ($jter=0; $jter<$t1...
Hope this will help you.
Edit part update
for negative stand you need to replace
$subject_length = $templine[9] -$templine[8];
with
if($templine[8] > $templine[9]){
$subject_length = $templine[8] -$templine[9];
}else{
$subject_length = $templine[9] -$templine[8];
}
You also need to update your extract_from_genome2 code for negative strand sequences.

How to capture several text from a file and print it with specific format?

I have a file with the following content:
CLASS
1001
CATEGORY
11 12 13 15
16 17
CLASS
3101
CATEGORY
900 901 902 904 905 907
908 909
910 912 913
CLASS
8000
CATEGORY
400 401 402 403
and I like to reformat it using perl or awk to get the following result:
1001 11&12&13&15&16&17
3101 900&901&902&904&905&907&908&909&910&912&913
8000 400&401&402&403
Your help would be appreciated. (I used to do it with excel VBA), but this time I like to make it simple using perl or awk. Thanks in advance. :)
perl -lne'
BEGIN{ $/ ="CLASS"; $" ="&" }
($x, #F) = /\d+/g or next;
print "$x #F"
' file
output
1001 11&12&13&15&16&17
3101 900&901&902&904&905&907&908&909&910&912&913
8000 400&401&402&403
Another awk version
awk '/CLASS/ {c=1;f=0;if (NR>1) print a;next} c {a=$0 " ";c=0} /CATEGORY/ {f=1;c=0;next} f {gsub(/ /,"\\&",$0);a=a $0} END {print a}' file
1001 11&12&13&1516&17
3101 900&901&902&904&905&907908&909910&912&913
8000 400&401&402&403

nrpe unable to run custom perl script: Return Code: 1, Output: NRPE: Unable to read output

I'm trying to implement a custom perl nagios script to check for rogue dhcp servers remotely with nrpe. On the central server when i run:
/usr/local/nagios/libexec/check_nrpe -H 10.9.0.25 -c check_roguedhcp
In my debugging logs i'm seeing this:
Host is asking for command 'check_roguedhcp' to be run...
Running command: sudo /usr/lib64/nagios/plugins/check_roguedhcp.pl
Command completed with return code 1 and output:
Return Code: 1, Output: NRPE: Unable to read output
Locally if i run the script (even as the nrpe user) I get the expected output.
On the local server my /etc/nagios/nrpe.cfg has the following settings:
command[check_roguedhcp]=sudo /usr/lib64/nagios/plugins/check_roguedhcp.pl
command[check_dhcp]=sudo /usr/lib64/nagios/plugins/check_dhcp -v
nrpe_user=nrpe
nrpe_group=nagios
ps aux shows nrpe is running as user nrpe (nrpe is in group nagios)
nrpe 5941 0.0 0.1 52804 2384 ? Ss 08:25 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
I've added the command to /etc/sudoers
%nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios64/plugins/check_dhcp, /usr/lib64/nagios/plugins/check_roguedhcp.pl
on my central server that does the nrpe calls, i have the following service groups and configurations:
define servicegroup{
servicegroup_name rogue_dhcp
alias All dhcp monitors
}
define service{
name security-service
servicegroups rogue_dhcp
register 0
max_check_attempts 1
}
Nagios can run any other check_users etc script via nrpe on this server.
Here's the perl script itself, though we know that the file executes locally just fine.
1 #!/usr/bin/perl -w
2 # nagios: -epn
3 # the above makes nagios run the script separately.
4 use POSIX;
5 use lib "/usr/lib64/nagios/plugins";
6 use utils qw(%ERRORS);
7
8 sub fail_usage {
9 if (scalar #_) {
10 print "$0: error: \n";
11 map { print " $_\n"; } #_;
12 }
13 print "$0: Usage: \n";
14 print "$0 [-v [-v [-v]]] [ []] \n";
15 print "$0 [-v [-v [-v]]] [-s] [[-s] [[-s] ]] \n";
16 print " \n";
17 exit 3 ;
18 }
19
20 my $verbose = 0;
21 my %servers=(
22 "x", "10.x.x.x",
23 "x", "10.x.x.x",
24 "x", "10.x.x.x",
25 "x", "10.x.x.x"
26 );
27
28 # examine commandline args
29 while ($ARGV=$ARGV[0]) {
30 my $myarg = $ARGV;
31 if ($ARGV eq '-s') {
32 shift #ARGV;
33 if (!($ARGV = $ARGV[0])) { fail_usage ("$myarg needs an argument"); }
34 if ($ARGV =~ /^-/) { fail_usage ("$myarg must be followed by an argument"); }
35 if (!defined($servers{$ARGV})) { $servers{$ARGV}=1; }
36 }
37 elsif ($ARGV eq '-v' ) { $verbose++; }
38 elsif ($ARGV eq '-h' or $ARGV eq '--help' ) { fail_usage ; }
39 elsif ($ARGV =~ /^-/ ) { fail_usage " invalid option ($ARGV)"; }
40 elsif ($ARGV =~ /^\d+\.\d+\.\d+\.\d+$/)
41 # servers should be ip addresses. I'm not doing detailed checks for this.
42 { if (!defined($servers{$ARGV})) { $servers{$ARGV}=1; } }
43 else { last; }
44 shift #ARGV;
45 }
46 # for some reason I can't test for empty ARGs in the while loop
47 #ARGV = grep {!/^\s*$/} #ARGV;
48 if (scalar #ARGV) { fail_usage "didn't understand arguments: (".join (" ",#ARGV).")"; }
49
50 my $serversn = scalar keys %servers;
51
52 if ($verbose > 2) {
53 print "verbosity=($verbose)\n";
54 print "servers = ($serversn)\n";
55 if ($serversn) { for my $i (keys %servers) { print "server ($i)\n"; } }
56 }
57
58 if (!$serversn) { fail_usage "no servers"; }
59 my $responses=0;
60 my $responders="";
61 my #check_dhcp = qx{/usr/lib64/nagios/plugins/check_dhcp -v};
62 foreach my $value (#check_dhcp) {
63 if ($value =~ /Added offer from server \# /i){
64 $value =~ m/(\d+\.\d+\.\d+\.\d+)/i;
65 my $host = $1;
66 # we find a server in our list
67 if (defined($servers{$host})) { $responses++; $responders.="$host "; }
68 # we find a rogue DHCP server. Danger Will Robinson!
69 else {
70 print "DHCP:CRITICAL: DHCP service running on $host";
71 exit $ERRORS{'OK'}
72 }
73 }
74 }
75 # we saw all the servers in our list. All is good.
76 if ($responses == $serversn) {
77 print "DHCP:OK: $responses of $serversn Expected Responses to DHCP Broadcast";
78 exit $ERRORS{'OK'};
79 }
80 # we found no DHCP responses.
81 if ($responses == 0) {
82 print "DHCP:OK: no rogue servers detected!!!!#!##";
83 exit $ERRORS{'OK'}
84 }
85 # we found less DHCP servers than we should have. Oh Nos!
86 $responders =~ s/ $//;
87 print "DHCP:OK: $responses of $serversn Responses to DHCP Broadcast. ($responders) responded. ";
88 exit $ERRORS{'OK'};
Here's what I am seeing (of relevance) when I do an strace of the nrpe process.
955 6950 stat("/usr/lib64/nagios/plugins/check_roguedhcp.pl", {st_mode=S_IFREG|S_ISUID|S_ISGID|0755, st_size=2799, ...}) = 0
956 6950 setresuid(4294967295, 4294967295, 4294967295) = 0
957 6950 setresgid(4294967295, 536347864, 4294967295) = 0
958 6950 setgroups(3, [536347864, 536347137, 536353632]) = 0
959 6950 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such device or address)
960 6950 socket(PF_NETLINK, SOCK_RAW, 9) = 3
961 6950 fcntl(3, F_SETFD, FD_CLOEXEC) = 0
962 6950 fcntl(3, F_SETFD, FD_CLOEXEC) = 0
963 6950 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff3de81ac0) = -1 ENOTTY (Inappropriate ioctl for device)
964 6950 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff3de81ac0) = -1 EINVAL (Invalid argument)
965 6950 ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff3de81ac0) = -1 ENOTTY (Inappropriate ioctl for device)
966 6950 getcwd("/", 4096) = 2
967 6950 sendto(3, "d\0\0\0c\4\5\0\1\0\0\0\0\0\0\0cwd=\"/\" cmd=\"/us"..., 100, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 100
968 6950 poll([{fd=3, events=POLLIN}], 1, 500) = 1 ([{fd=3, revents=POLLIN}])
969 6950 recvfrom(3, "$\0\0\0\2\0\0\0\1\0\0\0&\33\0\0\0\0\0\0d\0\0\0c\4\5\0\1\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NE TLINK, pid=0, groups=00000000}, [12]) = 36
970 6950 recvfrom(3, "$\0\0\0\2\0\0\0\1\0\0\0&\33\0\0\0\0\0\0d\0\0\0c\4\5\0\1\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pi d=0, groups=00000000}, [12]) = 36
971 6950 write(2, "sudo", 4) = 4
972 6950 write(2, ": ", 2) = 2
973 6950 write(2, "sorry, you must have a tty to ru"..., 38) = 38
974 6950 write(2, "\n", 1) = 1
975 6950 setresuid(4294967295, 4294967295, 4294967295) = 0
976 6950 setresgid(4294967295, 4294967295, 4294967295) = 0
977 6950 exit_group(1) = ?
978 6949 <... read resumed> "", 4096) = 0
979 6949 --- SIGCHLD (Child exited) # 0 (0) ---
980 6949 close(5) = 0
981 6949 wait4(6950, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 6950
970 6950 recvfrom(3, "$\0\0\0\2\0\0\0\1\0\0\0&\33\0\0\0\0\0\0d\0\0\0c\4\5\0\1\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pi d=0, groups=00000000}, [12]) = 36
971 6950 write(2, "sudo", 4) = 4
972 6950 write(2, ": ", 2) = 2
973 6950 write(2, "sorry, you must have a tty to ru"..., 38) = 38
974 6950 write(2, "\n", 1) = 1
975 6950 setresuid(4294967295, 4294967295, 4294967295) = 0
976 6950 setresgid(4294967295, 4294967295, 4294967295) = 0
977 6950 exit_group(1) = ?
This was solved by adding the following to /etc/sudoers
Defaults:nagios !requiretty
in my case i have resolved changing permissions of scripts file under /nagios/libexec/
do not work with root:root and WORK with nagios:nagios user permission!
I changed permission of my specific script on libexec folder to allow the "Other" (non-root users) to execute it chmod 755 myfile.pl, and it worked well.

Deleting lines with sed or awk

I have a file data.txt like this.
>1BN5.txt
207
208
211
>1B24.txt
88
92
I have a folder F1 that contains text files.
1BN5.txt file in F1 folder is shown below.
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 422 C SER A 248 70.124 -29.955 8.226 1.00 55.81 C
ATOM 615 H LEU B 208 3.361 -5.394 -6.021 1.00 10.00 H
ATOM 616 HA LEU B 211 2.930 -4.494 -3.302 1.00 10.00 H
ATOM 626 N MET B 87 1.054 -3.071 -5.633 1.00 10.00 N
ATOM 627 CA MET B 87 -0.213 -2.354 -5.826 1.00 10.00 C
1B24.txt file in F1 folder is shown below.
ATOM 630 CB MET B 87 -0.476 -2.140 -7.318 1.00 10.00 C
ATOM 631 CG MET B 88 -0.828 -0.688 -7.575 1.00 10.00 C
ATOM 632 SD MET B 88 -2.380 -0.156 -6.830 1.00 10.00 S
ATOM 643 N ALA B 92 -1.541 -4.371 -5.366 1.00 10.00 N
ATOM 644 CA ALA B 94 -2.560 -5.149 -4.675 1.00 10.00 C
I need only the lines containing 207,208,211(6th column)in 1BN5.txt file. I want to delete other lines in 1BN5.txt file. Like this, I need only the lines containing 88,92 in 1B24.txt file.
Desired output
1BN5.txt file
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 615 H LEU B 208 3.361 -5.394 -6.021 1.00 10.00 H
ATOM 616 HA LEU B 211 2.930 -4.494 -3.302 1.00 10.00 H
1B24.txt file
ATOM 631 CG MET B 88 -0.828 -0.688 -7.575 1.00 10.00 C
ATOM 632 SD MET B 88 -2.380 -0.156 -6.830 1.00 10.00 S
ATOM 643 N ALA B 92 -1.541 -4.371 -5.366 1.00 10.00 N
Here's one way using GNU awk. Run like:
awk -f script.awk data.txt
Contents of script.awk:
/^>/ {
file = substr($1,2)
next
}
{
a[file][$1]
}
END {
for (i in a) {
while ( ( getline line < ("./F1/" i) ) > 0 ) {
split(line,b)
for (j in a[i]) {
if (b[6]==j) {
print line > "./F1/" i ".new"
}
}
}
system(sprintf("mv ./F1/%s.new ./F1/%s", i, i))
}
}
Alternatively, here's the one-liner:
awk '/^>/ { file = substr($1,2); next } { a[file][$1] } END { for (i in a) { while ( ( getline line < ("./F1/" i) ) > 0 ) { split(line,b); for (j in a[i]) if (b[6]==j) print line > "./F1/" i ".new" } system(sprintf("mv ./F1/%s.new ./F1/%s", i, i)) } }' data.txt
If you have an older version of awk, older than GNU Awk 4.0.0, you could try the following. Run like:
awk -f script.awk data.txt
Contents of script.awk:
/^>/ {
file = substr($1,2)
next
}
{
a[file]=( a[file] ? a[file] SUBSEP : "") $1
}
END {
for (i in a) {
split(a[i],b,SUBSEP)
while ( ( getline line < ("./F1/" i) ) > 0 ) {
split(line,c)
for (j in b) {
if (c[6]==b[j]) {
print line > "./F1/" i ".new"
}
}
}
system(sprintf("mv ./F1/%s.new ./F1/%s", i, i))
}
}
Alternatively, here's the one-liner:
awk '/^>/ { file = substr($1,2); next } { a[file]=( a[file] ? a[file] SUBSEP : "") $1 } END { for (i in a) { split(a[i],b,SUBSEP); while ( ( getline line < ("./F1/" i) ) > 0 ) { split(line,c); for (j in b) if (c[6]==b[j]) print line > "./F1/" i ".new" } system(sprintf("mv ./F1/%s.new ./F1/%s", i, i)) } }' data.txt
Please note that this script does exactly as you describe. It expects files like 1BN5.txt and 1B24.txt to reside in the folder F1 in the present working directory. It will also overwrite your original files. If this is not the desired behavior, drop the system() call. HTH.
Results:
Contents of F1/1BN5.txt:
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 615 H LEU B 208 3.361 -5.394 -6.021 1.00 10.00 H
ATOM 616 HA LEU B 211 2.930 -4.494 -3.302 1.00 10.00 H
Contents of F1/1B24.txt:
ATOM 631 CG MET B 88 -0.828 -0.688 -7.575 1.00 10.00 C
ATOM 632 SD MET B 88 -2.380 -0.156 -6.830 1.00 10.00 S
ATOM 643 N ALA B 92 -1.541 -4.371 -5.366 1.00 10.00 N
Don't try to delete lines from the existing file, try to create a new file with only the lines you want to have:
cat 1bn5.txt | awk '$6 == 207 || $6 == 208 || $6 == 211 { print }' > output.txt
assuming gnu awk, run this command from the directory containing data.txt:
awk -F">" '{if($2 != ""){fname=$2}if($2 == ""){term=$1;system("grep "term" F1/"fname" >>F1/"fname"_results");}}' data.txt
this parses data.txt for filenames and search terms, then calls grep from inside awk to append the matches from each file and term listed in data.txt to a new file in F1 called originalfilename.txt_results.
if you want to replace the original files completely, you could then run this command:
grep "^>.*$" data.txt | sed 's/>//' | xargs -I{} find F1 -name {}_results -exec mv F1/{}_results F1/{} \;
This will move all of the files in F1 to a tmp dir named "backup" and then re-create just the resultant non-empty files under F1
mv F1 backup &&
mkdir F1 &&
awk '
NF==FNR {
if (sub(/>/,"")) {
file=$0
ARGV[ARGC++] = "backup/" file
}
else {
tgt[file,$0] = "F1/" file
}
next
}
(FILENAME,$6) in tgt {
print > tgt[FILENAME,$6]
}
' data.txt &&
rm -rf backup
If you want the empty files too it's a trivial tweak and if you want to keep the backup dir just get rid of the "&& rm.." at the end (do that during testing anyway).
EDIT: FYI this is one case where you could argue the case for getline not being completely incorrect since it's parsing a first file that's totally unlike the rest of the files in structure and intent so parsing that one file differently from the rest isn't going to cause any maintenance headaches later:
mv F1 backup &&
mkdir F1 &&
awk -v data="data.txt" '
BEGIN {
while ( (getline line < data) > 0 ) {
if (sub(/>/,"",line)) {
file=line
ARGV[ARGC++] = "backup/" file
}
else {
tgt[file,line] = "F1/" file
}
}
}
(FILENAME,$6) in tgt {
print > tgt[FILENAME,$6]
}
' &&
rm -rf backup
but as you can see it makes the script a bit more complicated (though slightly more efficient as there's now no test for FNR==NR in the main body).
This solution plays some tricks with the record separator: "data.txt" uses > as the record separator, while the other files use newline.
awk '
BEGIN {RS=">"}
FNR == 1 {
# since the first char in data.txt is the record separator,
# there is an empty record before the real data starts
next
}
{
n = split($0, a, "\n")
file = "F1/" a[1]
newfile = file ".new"
RS="\n"
while (getline < file) {
for (i=2; i<n; i++) {
if ($6 == a[i]) {
print > newfile
break
}
}
}
RS=">"
system(sprintf("mv \"%s\" \"%s.bak\" && mv \"%s\" \"%s\"", file, file, newfile, file))
}
' data.txt
Definitely a job for awk:
$ awk '$6==207||$6==208||$6==211 { print }' 1bn5.txt
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 615 H LEU B 208 3.361 -5.394 -6.021 1.00 10.00 H
ATOM 616 HA LEU B 211 2.930 -4.494 -3.302 1.00 10.00 H
$ awk '$6==92||$6==88 { print }' 1B24.txt
ATOM 631 CG MET B 88 -0.828 -0.688 -7.575 1.00 10.00 C
ATOM 632 SD MET B 88 -2.380 -0.156 -6.830 1.00 10.00 S
ATOM 643 N ALA B 92 -1.541 -4.371 -5.366 1.00 10.00 N
Redirect to save the output:
$ awk '$6==207||$6==208||$6==211 { print }' 1bn5.txt > output.txt
I don't think you can do this with just sed alone. You need a loop to read your file data.txt. For example, using a bash script:
#!/bin/bash
# First remove all possible "problematic" characters from data.txt, storing result
# in data.clean.txt. This removes everything except A-Z, a-z, 0-9, leading >, and ..
sed 's/[^A-Za-z0-9>\.]//g;s/\(.\)>/\1/g;/^$/d' data.txt >| data.clean.txt
# Next determine which lines to keep:
cat data.clean.txt | while read line; do
if [[ "${line:0:1}" == ">" ]]; then
# If input starts with ">", set remainder to be the current file
file="${line:1}"
else
# If value is in sixth column, add "keep" to end of line
# Columns assumed separated by one or more spaces
# "+" is a GNU extension, so we need the -r switch
sed -i -r "/^[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +$line +/s/$/keep/" $file
fi
done
# Finally delete the unwanted lines, i.e. those without "keep":
# (assumes each file appears only once in data.txt)
cat data.clean.txt | while read line; do
if [[ "${line:0:1}" == ">" ]]; then
sed -i -n "/keep/{s/keep//g;p;}" ${line:1}
fi
done