LWP::Simple and cron - perl

This script works every time from command line but very rarely from cron job. I can run it like this: ./chort.pl
But it dies very frequently (not always) with "wrong values" message while calling it from cron:
*/2 10-18 * * 1-5 /path/to/chort.pl >> /tmp/chort.pl 2>&1
When it dies, than the $res is empty. print "*$res*\n"; prints **. So , it seems, that there is a problem with getting of webpage while calling from cron.
Here a snippet:
sub getLSEVal
{
my $fourWayKey = shift;
my $url = "http://pat.to.url";
my $res;
timeout 280 => sub {
$res = get $url ;
return (-2, -2, -2 );
};
print $res;
$res =~ /Price\| High \| Low .*?<tr class="odd">.*?<td>(.*?)<\/td>.*?<td>(.*?)<\/td>.*?<td>(.*?)<\/td>/msig;
my $c = $1;
my $h = $2;
my $l = $3;
print "$1 $2 $3\n";
die ("wrong values") if $c !~ /\d+\.?\d*/ or $h !~ /\d+\.?\d*/ or $l !~ /\d+\.?\d*/;
return ($c, $h, $l);
}

You probably need to use LWP::UserAgent, which allows you a higher level of control. LWP::Simple sometimes is too abstract to know what is happening when a problem appears.

Related

get request in perl and Use of uninitialized value

my $url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=journal+of+medical+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]&usehistory=y";
print "\n before url \n";
print $url;
#post the esearch URL
my $output = get($url);
print $output;
I have not used perl ever before.
If I hit this URL in browser, I do get the XML.
However, From what I see in output from script, $output is empty and
print $output;
returns
Use of uninitialized value in print at ./extractEmails.pl line 48.
Please suggest what's wrong and how to fix it
Edit:
As suggested, complete code:
#!/usr/bin/perl -w
# A perlscript written by Joseph Hughes, University of Glasgow
# use this perl script to parse the email addressed from the affiliations in PubMed
use strict;
use LWP::Simple;
my ($query,#queries);
#Query the Journal of Virology from 2014 until the present (use 3000)
$query = 'journal+of+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Journal of General Virology
$query = 'journal+of+general+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Virology
$query = 'virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Archives of Virology
$query = 'archives+of+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Virus Research
$query = 'virus+research[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Antiviral Research
$query = 'antiviral+research[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Viruses
$query = 'viruses[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Journal of Medical Virology
$query = 'journal+of+medical+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
# global variables
push(#queries,$query);
my %emails;
my $emailcnt=0;
my $count=1;
#assemble the esearch URL
foreach my $query (#queries){
my $base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
#my $url = $base . "esearch.fcgi?db=pubmed&term=$query&usehistory=y";
my $url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=journal+of+medical+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]&usehistory=y";
print "\n before url \n";
print $url;
#post the esearch URL
my $output = get($url);
print "\n before output \n";
print get($url);
print $output;
#parse WebEnv, QueryKey and Count (# records retrieved)
my $web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
my $key = $1 if ($output =~ /<QueryKey>(\d+)<\/QueryKey>/);
my $count = $1 if ($output =~ /<Count>(\d+)<\/Count>/);
#retrieve data in batches of 500
my $retmax = 500;
for (my $retstart = 0; $retstart < $count; $retstart += $retmax) {
my $efetch_url = $base ."efetch.fcgi?db=pubmed&WebEnv=$web";
$efetch_url .= "&query_key=$key&retmode=xml";
my $efetch_out = get($efetch_url);
my #matches = $efetch_out =~ m(<Affiliation>(.*)</Affiliation>)g;
#print "$_\n" for #matches;
for my $match (#matches){
if ($match=~/\s([a-zA-Z0-9\.\_\-]+\#[a-zA-Z0-9\.\_\-]+)$/){
my $email=$1;
$email=~s/\.$//;
$emails{$email}++;
}
}
}
my $cnt= keys %emails;
print "$query\n$cnt\n";
}
print "Total number of emails: ";
my $cnt= keys %emails;
print "$cnt\n";
my #email = keys %emails;
my #VAR;
push #VAR, [ splice #email, 0, 100 ] while #email;
my $batch=100;
foreach my $VAR (#VAR){
open(OUT, ">Set_$batch\.txt") || die "Can't open file!\n";
print OUT join(",",#$VAR);
close OUT;
$batch=$batch+100;
}
I recommend against using LWP::Simple for any reason because it is impossible to configure it or handle errors usefully. Using LWP::UserAgent which it wraps is nearly as simple anyway (though the error handling is a bit complicated). The below examples would replace the use LWP::Simple; and my $output = get($url); lines.
use strict;
use warnings;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new(timeout => 30);
my $response = $ua->get($url);
unless ($response->is_success) {
# the Client-Warning, Client-Aborted, and X-Died headers each may be set on client/transport errors
die $response->status_line;
}
my $output = $response->decoded_content;
The core HTTP::Tiny is also simple.
use strict;
use warnings;
use HTTP::Tiny;
my $ua = HTTP::Tiny->new;
my $response = $ua->get($url);
unless ($response->{success}) {
die $response->{status} == 599 ? $response->{content} : "$response->{status} $response->{reason}";
}
my $output = $response->{content};
If you really want an LWP::Simple approach that will at least report transport errors, try ojo from Mojolicious:
perl -Mojo -E'say g(shift)->text' http://example.com
In a script rather than a oneliner, you can use Mojo::UserAgent directly, and also handle HTTP errors like above:
use strict;
use warnings;
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $response = $ua->get($url)->result;
unless ($response->is_success) {
die $response->code . ' ' . $response->message;
}
my $output = $response->text;

Perl two identical strings compare as not equal

Here is my code:
my $self = shift;
my $h = shift;
print "$h\n";
my #headers = split /,/, $h;
foreach my $el (#{$expected}) {
my $t = shift #headers;
chomp ($t);
chomp ($el);
print Dumper($el cmp $t, $el, $t);
print "test: \'$el\' eq \'$t\' ";
unless ($el eq $t) {
print "not ok $el ne $t\n";
return 0;
} else {
print "ok\n";
}
}
return 1;
In my first unit test I pass a string to $h which matches $expected.
I then have a unit test which ensure the function fails when passed a string that does not match. These two tests behave as expected.
Server,Jira Project,Issue Type,Summary,Description,Assignee,Labels,Epic Link
$VAR1 = 0;
$VAR2 = 'Server';
$VAR3 = 'Server';
test: 'Server' eq 'Server' ok
When I pull the line in from a CSV file and pass it to this function I get a different response.
Server,Jira Project,Issue Type,Summary,Description,Assignee,Labels,Epic Link
$VAR1 = -1;
$VAR2 = 'Server';
$VAR3 = 'Server';
test: 'Server' eq 'Server' not ok Server ne Server
The compare somehow implies that the expected value is somehow less than what is being tested. My mind immediately goes to there must be a leading or trailing character on what is being tested. However, printing and Dumping doesn't seem to confirm that (unless I'm missing something). The chomps are just out of sheer desperation.
What am I missing?
First off thanks #hobbs for the $Data::Dumper::Useqq = 1; suggestion. It was immediately obvious that there were some leading characters that I didn't know how to detect.
$VAR1 = -1;
$VAR2 = "Server";
$VAR3 = "\357\273\277Server";
The CSV file I am reading contained Byte Ordermarking information.
I'll strip them off. Thanks all for getting me going again.

How to exclude or not print previous found entries in script

Can someone shed some light on how to have my script kick off only if new entries are found for the current hour? Our logs are based in 00 01 02, etc.
When this runs it will look for any accounts specified within my for loop and send an email if this particular user made a cert change for the hour. If a match is found then everything is fine.
But I am interested only in new real-time entries which I can't figure out.
This will be run from a cron, and I can't have it repeating the same entry. I am new to Perl and can't seem to figure this out.
I have tried the File::Tail module and other CPAN mods but due to company policies some mods are not allowed.
#!/usr/bin/perl -w
use strict;
my $flag = 0;
my $few = shift || 1;
my $id;
my $newline;
my $partyId;
my $userid;
my $tid;
my $infile;
my #Takeraccounts = ( 'SCN', 'CX' );
my $mail_dest = 'xxxxx#cx.com';
my %TIME;
(
$TIME{SEC}, $TIME{MIN}, $TIME{HOUR}, $TIME{MDAY}, $TIME{MON},
$TIME{YEAR}, $TIME{WDAY}, $TIME{YDAY}, $TIME{ISDST}
) = localtime(time);
my $OLD_MIN = $TIME{MIN};
my $OLD_HOUR = $TIME{HOUR};
my $cmd = "cat /raid/logs/`date +%H`";
my $out_file = "/home/resource/certchange.txt";
open FF, "$cmd |";
open( OUT, ">> $out_file" ) || die "Cannot open $out_file"; # temp file to which to write the formated output
while ( <FF> ) {
my $line = $_;
#chomp ($now_time);
$line =~ s/\n/ /;
if ( /Updating cert/ .. /,permissions/ ) {
$newline = "$line";
if ( $line =~ /Updating cert.*updated by (\w+)/ ) {
$id = $1;
}
if ( $newline =~ /UPDATE.*id:(\w+).*partyId:(\w+),perm:/ ) {
$userid = $1;
$partyId = $2;
foreach (#Takeraccounts) {
if ( $partyId =~ /$_/ ) {
print OUT "Certificate cert Updated by $id for userid $userid, PartyID $partyId\n";
open ML, "| mutt -e\"set realname='Support'; set use_from=yes; set from='support\#cx.com'; set envelope_from=yes\" -s ' Alert! cert CHANGED' -i $out_file -- $mail_dest";
close ML;
}
}
}
}
}
close FF;
close(OUT);
unlink $out_file;

modification of script in perl

currently I have the following script
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $header = <> . <>;
print $header;
my $last_sequence_number = 0;
open( my $output, ">", "output.$last_sequence_number.out" ) or die $!;
print {$output} $header;
$seen{$last_sequence_number}++;
while (<>) {
my ($key) = split;
next unless $key =~ m/^\d+$/;
my $sequence_number = int( $key / 1000 );
if ( not $sequence_number == $last_sequence_number ) {
print "Opening new file for $sequence_number\n";
close($output);
open( $output, ">", "output.$sequence_number.out" ) or die $!;
print {$output} $header unless $seen{$sequence_number}++;
$last_sequence_number = $sequence_number;
}
print {$output} $_;
}
the script splits a file into other files with the pattern file 1 file 2 ... now I would need to pass to the script another parameter which allows to specify a prefix for the output so if this additional input is 1 then the output would be
1_file1,1_file2....and so on.. how could I do that?
I know that something like
use Getopt::Long;
could be used?
tried this
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $header = <> . <>;
print $header;
my ( $suffix, $filename ) = #ARGV;
open ( my $input, "<", $filename ) or die $!;
my $last_sequence_number = 0;
open( my $output, ">", "output.$last_sequence_number.out" ) or die $!;
print {$output} $header;
$seen{$last_sequence_number}++;
while (<$input>) {
my ($key) = split;
next unless $key =~ m/^\d+$/;
my $sequence_number = int( $key / 1000 );
if ( not $sequence_number == $last_sequence_number ) {
print "Opening new file for $sequence_number\n";
close($output);
open( $output, ">", "output.$sequence_number.out" ) or die $!;
print {$output} $header unless $seen{$sequence_number}++;
$last_sequence_number = $sequence_number;
}
print {$output} $_;
}
but that is not working. What is wrong?
I get
No such file or directory at ./spl.pl line 10, <> line 2.
after the header is printed.
As Sobrique says, your problem is the magical nature of <>. But I don't think that it's as hard to deal with as he thinks.
The point is that <> looks at the current value of #ARGV. So you can add other command line arguments as long as you ensure that you have removed them from #ARGV before you use <> for the first time.
So change your code so that it starts like this:
my %seen;
my $prefix = shift;
my $header = <> . <>;
You can then call your program like this:
$ your_program.pl prefix_goes_here list of file names...
Everything else should now work the same as it currently does, but you have your prefix stored away in $prefix so that you can use it in your print statements.
I hope that's what you wanted. Your question isn't particularly clear.
I would do something like this.
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use Getopt::Long qw(:config bundling);
use Pod::Usage;
{
my $man = 0;
my $help = 0;
my $verbose = 0;
my $prefix = '';
my $suffix = '';
my $header_lines = 2;
my $bunch_size = 1000;
GetOptions(
'help|?' => \$help,
'man' => \$man,
'verbose|v+' => \$verbose,
'prefix|p=s' => \$prefix,
'suffix|s=s' => \$suffix,
'header|h=i' => \$header_lines,
'bunch|batch|bucket|b=i' => \$bunch_size
) or pod2usage(2);
pod2usage(1) if $help;
pod2usage( -exitval => 0, -verbose => 2 ) if $man;
pod2usage(
-exitval => 3,
-message => "Headers lines can't be negative number"
) if $header_lines < 0;
pod2usage(
-exitval => 4,
-message => "Bunch size has to be positive"
) unless $bunch_size > 0;
my $header = '';
$header .= <> for 1 .. $header_lines;
my %seen;
my $current_output_number = -1;
sub key2output { int( shift() / $bunch_size ) }
sub set_output {
my $output_number = shift;
if ( $output_number != $current_output_number ) {
my $seen = $seen{$output_number}++;
printf STDOUT "Opening %sfile for %d\n", $seen ? '' : 'new ',
$output_number
if $verbose;
open my $fh, $seen ? '>>' : '>',
$prefix . $output_number . $suffix;
select $fh;
print $header unless $seen;
$current_output_number = $output_number;
}
}
}
while (<>) {
my ($key) = /^(\d+)\s/;
next unless defined $key;
set_output( key2output($key) );
print;
}
__END__
=head1 NAME
code.pl - splits file by first number by thousands
=head1 SYNOPSIS
code.pl [options] [file ...]
Options:
--help brief help message
--man full documentation
--prefix output filename prefix
--suffix outpit filename suffix
--header number of header lines (default: 2)
=head1 OPTIONS
=over 8
=item B<--help>
Print a brief help message and exits.
=item B<--man>
Prints the manual page and exits.
=back
=head1 DESCRIPTION
B<This program> will read the given input file(s) and do something
useful with the contents thereof.
=cut
Just finish documentation and you can ship it to your colleagues.
The problem you've got is that the diamond operator <> is a piece of special perl magic.
It takes 'all filenames on command line' opens them and processes them in order.
To do what you're trying to do:
my ( $suffix, $filename ) = #ARGV;
open ( my $input, "<", $filename ) or die $!;
Then you can change your while loop to:
while ( <$input> ) {
And modify the output filename according to your desires. The key different there is that it'll only take one filename at that point - first arg is suffix, second is name.
You could perhaps extend this with:
my ( $suffix, #names ) = #ARGV;
And then run a foreach loop:
foreach my $filename ( #names ) {
open .... #etc

Perl resume download from this script

I have this Perl-based download script.
I'd like to know how to make sure that when a user downloads a file with this script, can pause and resume the download (download resumable).
This is the code:
#!/usr/bin/perl
use XFSConfig;
use HCE_MD5;
use CGI::Carp qw(fatalsToBrowser);
my $code = (split('/',$ENV{REQUEST_URI}))[-2];
my $hce = HCE_MD5->new($c->{dl_key},"XFileSharingPRO");
my ($file_id,$file_code,$speed,$ip1,$ip2,$ip3,$ip4,$expire) = unpack("LA12SC4L", $hce->hce_block_decrypt(decode($code)) );
print("Content-type:text/html\n\nLink expired"),exit if time > $expire;
$speed||=500;
my $dx = sprintf("%05d",$file_id/$c->{files_per_folder});
my $ip="$ip1.$ip2.$ip3.$ip4";
$ip=~s/\.0$/.\\d+/;
$ip=~s/\.0\./.\\d+./;
$ip=~s/\.0\./.\\d+./;
$ip=~s/^0\./\\d+./;
print("Content-type:text/html\n\nNo file"),exit unless -f "$c->{upload_dir}/$dx/$file_code";
print("Content-type:text/html\n\nWrong IP"),exit if $ip && $ENV{REMOTE_ADDR}!~/^$ip/;
my $fsize = -s "$c->{upload_dir}/$dx/$file_code";
$|++;
open(my $in_fh,"$c->{upload_dir}/$dx/$file_code") || die"Can't open source file";
# unless($ENV{HTTP_ACCEPT_CHARSET}=~/utf-8/i)
# {
# $fname =~ s/([^A-Za-z0-9\-_.!~*'() ])/ uc sprintf "%%%02x",ord $1 /eg;
# $fname =~ tr/ /+/;
# }
print qq{Content-Type: application/octet-stream\n};
print qq{Content-length: $fsize\n};
#print qq{Content-Disposition: attachment; filename="$fname"\n};
print qq{Content-Disposition: attachment\n};
print qq{Content-Transfer-Encoding: binary\n\n};
$speed = int 1024*$speed/10;
my $buf;
while( read($in_fh, $buf, $speed) )
{
print $buf;
select(undef,undef,undef,0.1);
}
sub decode
{
$_ = shift;
my( $l );
tr|a-z2-7|\0-\37|;
$_=unpack('B*', $_);
s/000(.....)/$1/g;
$l=length;
$_=substr($_, 0, $l & ~7) if $l & 7;
$_=pack('B*', $_);
}
Thanks
To pause and resume downloads you should handle the http range header.
Take a look at http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35