This is my first question so I apologise in advance if I format/ask it all wrong.
I am using Perl to extract a string from a file, submit a web form, and download a new file created by the web-page. The aim is to have it run for 30,000 files in a loop, which I estimate will take ~8 days. I am using WWW::Selenium and WWW::Mechanize to perform the web automation. The issue I have is that if for some reason a page doesn't load properly or the internet drops for a period of time then the script exits and gives an error message like(depending on which stage it failed at):
Error requesting http://localhost:4444/selenium-server/driver/:
ERROR: Could not find element attribute: link=Download PDB File#href
I would like the script to continue running, moving onto the next round of the loop so I don't have to worry if a single round of the loop throws an error. My research suggests that using Try::Tiny may be the best solution. Currently I have the script below using only try{...} which seems to suppress any error and allow the script to continue through the files. However I'm concerned that this seems to be a very blunt solution and provides me no insight into which/why files failed.
Ideally I would want to print the filename and error message for each occurence to another file that could then be reviewed once the script is complete but I am struggling to understand how to use catch{...} to do this or if that is even the correct solution.
use strict;
use warnings;
use WWW::Selenium;
use WWW::Mechanize;
use Try::Tiny;
my #fastas = <*.fasta>;
foreach my $file (#fastas) {
try{
open(my $fh, "<", $file);
my $sequence;
my $id = substr($file, 0, -6);
while (my $line = <$fh>) {
## discard fasta header line
} elsif($line =~ /^>/) { # / (turn off wrong coloring)
next;
## keep line, add to sequence string
} else {
$sequence .= $line;
}
}
close ($fh);
my $sel = WWW::Selenium->new( host => "localhost",
port => 4444,
browser => "*firefox",
browser_url => "http://www.myurl.com",
);
$sel->start;
$sel->open("http://www.myurl.com");
$sel->type("chain1", $sequence);
$sel->type("chain2", "EVQLVESGPGLVQPGKSLRLSCVASGFTFSGYGMHWVRQAPGKGLEWIALIIYDESNKYYADSVKGRFTISRDNSKNTLYLQMSSLRAEDTAVFYCAKVKFYDPTAPNDYWGQGTLVTVSS");
$sel->click("css=input.btn.btn-success");
$sel->wait_for_page_to_load("30000");
## Wait through the holding page - will timeout after 5 mins
$sel->wait_for_element_present("link=Download PDB File", "300000");
## Get the filename part of link
$sel->wait_for_page_to_load("30000");
my $pdbName = $sel->get_attribute("link=Download PDB File\#href");
## Concatenate it with the main domain
my $link = "http://www.myurl.com/" . $pdbName;
$sel->stop;
my $mech = WWW::Mechanize->new( autocheck => 1 );
$mech -> get($link);
#print $mech -> content();
$mech -> save_content($id . ".pdb");
};
}
You are completely right that you want to see, log, and review all errors (and warnings). The mechanism and syntax provided by Try::Tiny is meant to be bare-bones and simple to use.
use warnings;
use strict;
use feature qw(say);
use Try::Tiny;
my #fastas = <*.fasta>;
my $errlog = 'error_log.txt';
open my $fh_err, '>', $errlog or die "Can't open $errlog for writing: $!";
foreach my $file (#fastas) {
try {
# processing, potentially throwing a die
}
catch {
say $fh_err "Error with $file: $_"; # NOTE, it is $_ (not $! or $#)
};
}
close $fh_err;
# Remove the log if empty
if (-z $errlog) {
say "No errors logged, removing $errlog";
unlink $errlog or warn "Can't unlink $errlog: $!";
}
You can save names of files for which the processing failed, with push #failed_files, $file inside the catch { } block. Then the code can attempt again after the main processing, if you know that errors are mostly due to random connection problems. And having the list of failed files is handy.
Note that with v5.14 the problems that this module addresses were fixed, so that a normal use of eval is fine. It is mostly a matter of preference at this point, but note that Try::Tiny has a few twists of its own. See this post for a discussion.
This addresses the question of the simple exception handling, not the rest of the code.
Related
I've looked around here a bit and found similar questions but not exactly. If there is one, I apologize and please point me to it.
I have the following code. I'm trying to create a csv file of simply an ID pulled from a filename and the filename itself. This is the ENTIRE script.
use strict;
use warnings;
use File::Find;
find( \&findAllFiles, '.');
exit;
sub findAllFiles {
my #fp1;
my #fp2;
my $patId;
my $filename;
my $testvar = "hello again";
$filename = $File::Find::name;
if ($filename =~ /\.pdf$/) {
open (my $fh, '>', 'filenames.csv') or die "Failed to open - $!\n";
print $fh "starting...$testvar\n" or die "Failed to print to file - $!\n";
#fp1 = split('/', $filename);
#fp2 = split('_', $fp1[-1]);
$patId = $fp2[-1];
$patId =~ s/\.pdf$//;
print "Adding $patId, file = $filename\n";
print $fh "$patId,$filename\n" or die "File print error: $!";
close $fh or warn "close failed! - $!";
}
return;
}
The line that prints to the screen, prints perfectly.
If I take the file open/close and the first print statement out of the if block, it prints that line into the file, but not the data inside the block.
I've tried every combo I can think of and it doesn't work. I've alternated between '>' and '>>' since it clearly needs the append since it's looping over filenames, but neither works inside the if block.
Even this code above doesn't throw the die errors! It just ignores those lines! I'm figuring there's something obvious I'm missing.
Quoting File::Find::find's documentation:
Additionally, for each directory found, it will chdir() into that directory
It means that when you open inside findAllFiles, you are potentially opening a file filenames.csv inside a subdirectory of your initial directory. You can run something like find . -name filenames.csv from your terminal, and you'll see plenty of filenames.csv. You can change this behavior by passing no_chdir option to find:
find( { wanted => \&findAllFiles, no_chdir => 1}, '.');
(and additionally changing > for >> in your open)
However, personally, I'd avoid repeatedly opening and closing filenames.csv when you could open it just once before calling find. If you don't want to have your filehandle globally defined, you can always pass it as an argument to findAllFiles:
{
open my $fh, '>', 'filenames.csv' or die "Failed to open 'filenames.csv': $!";
find(sub { findAllFiles($fh) }, '.')
}
sub findAllFiles {
my ($fh) = #_;
...
filenames.csv will be created in the directory where the pdf is found, since find() changes directories as it searches. If that's not what you want, use an absolute path to open it (or open it before calling find, which seems like a better idea).
#!/bin/perl
open( $WP, ">/home/Octa.txt" );
# Subroutine test
sub test {
$var1 = shift;
print $WP "TESTING\n";
}
# Subroutine func
sub func {
$var = shift;
if ( $var eq "Ocat" ) {
print $WP "String found\n";
test($var);
}
else {
print $WP "String not found\n";
}
}
$var3 = "Octa";
func($var3);
The issue is that the code is not able to write anything within the test subroutine or within the if condition of the 'funcsubroutine, but it prints in theelse` part of the 'func' subroutine.
First off, there is a typo -- you test $var against "Ocat", while Octa is intended.
So the test subroutine never gets called and only String not found is printed.
With that corrected and with the output file in a user writeable location, your program works.
However, some improvements are necessary.
use warnings;
use strict;
my $file = 'Octa.txt';
open my $WP, '>', $file or die "Can't open $file: $!";
my $var3 = "Octa";
func($WP, $var3);
#Subroutine test
sub test{
my ($fh, $var1) = #_;
print $fh "TESTING\n";
}
#Subroutine func
sub func{
my ($fh, $var) = #_;
if ($var eq "Octa"){
print $fh "String found\n";
test($fh, $var);
}
else {
print $fh "String not found\n";
}
}
I've changed the output file name since a user normally may not write to /home directory.
Comments
It is much better to use the three-argument form of open, in which case you get a lexical file handle which can be passed around nicely and is scoped. This question is a good example of how a global file handle can make things confusing, to say the least.
Always check the open call. For one thing, can you really write to /home directory?
Please always start programs with use warnings; and use strict;
There is another possibility for failure, which brings together practices in the comments above.
A file in /home normally isn't writeable by a user, in which case the posted program cannot work.
But without a check of open (which will fail) and without use warnings (which would be printed every time we touch the invalid $WH filehandle) we will not see any of these errors; instead, the program will quietly run and complete but it won't write the output file.
I am trying to upload a set of files to a remote machine in perl using Net::SFTP::Foreign module. The file names are stored in a text file .But all the files are not getting uploaded . Only one file is getting uploaded.
#!/usr/local/roadm/bin/perl
# This is compiled with threading support
use strict;
use warnings;
use threads;
use threads::shared;
use Net::SFTP::Foreign;
my $count=0;
my %args = (
user => 'root',
password => 'Ht5h10N2',
more => '-v',
autodisconnect => 0
);
print "Starting main program\n";
open(fa ,"<file_list.txt");
my #con =<fa>;
close fa;
my $sftp = Net::SFTP::Foreign->new('hadoop-dev2', %args);
foreach (#con)
{
chomp $_;
$sftp->put($_,"$_");
}
You never check the status of your various operations! Did your initial constructor creating$sftp work? Was that file you opened really opened? Does that file exist on the remote system?
You must always check the status of your commands in Perl!
use strict;
use warnings;
use feature qw(say);
use File::Basename;
use threads;
use threads::shared;
use Net::SFTP::Foreign;
my %args = (
user => 'root',
password => 'Ht5h10N2',
more => '-v',
autodisconnect => 0
);
# Where is `%args` coming from?
my $sftp = Net::SFTP::Foreign->new('hadoop-dev2', %args); # Check whether succeeded or failed!
if ( $sftp->error ) {
die qq(Could not establish the SFTP connection);
}
say "Starting main program";
open my $fh, "<", "file_list.txt" # Check whether succeeded or failed!
or die qq(Could not open file "file_list.txt");
}
while ( my $file = <$fh> ) {
chomp $file;
$sftp->put( $file, $file ) ); # Check whether succeeded or failed!
if ( $sftp->error ) {
warn qq(Could not download file "$file");
my $remote_files_ref = $sftp->ls(); # Check whether succeeded or failed!
if ( $sftp->error ) {
warn qq(Cannot get stat or remote directory.);
}
else {
say qq(List of files in "$remote_dir":);
for my $remote_file ( #{ $remote_files_ref } ) {
say " $remote_file";
}
}
}
}
Note I check whether my open worked, whether the constructor for $sftp worked, and every time I use a method from Net::SFTP::Foreign. For example, I can't download a file that doesn't exist. Maybe it doesn't exist, thus I do a $sftp->ls to see when it doesn't work.
You can use autodie which is a pragma for various file commands in Perl, and is a setting you can use for Net::SFTP::Foreign. Autodie is nice because it kills your program automatically upon an error, thus turning perl into more of an exception based language. This way, if there's an error, and you don't catch it, your program dies.
If you don't want you program to outright fail, you can use eval to test whether something worked or not:
$sftp->Net::SFTP::Foreign( yadda, yadda, { autodie => 1} ); #Autodie is now turned on:
eval { # Checks whether the file exists
$sftp->get( $file, $file );
}
if ( $# ) {
warn qq(ERROR: File "$file" is not found!);
} else {
say qq(Downloaded "$file".);
}
Reply
Theres no error i ran your code, but i still cant upload :( Any help would be appreciated
So, you're saying that $sftp->get doesn't download the file, but neither sets $sftp->error?
There are a few places in the code where I see that an undef is returned, but sftp->_set_error isn't be called. Let's just see if $sftp->get returns a true or undef. According to the source code, that's what it should be doing. If it's undef, we'll assume it failed.
while ( my $file = <$fh> ) {
chomp $file;
if ( not $sftp->put( $file, $file ) ) { # Check whether succeeded or failed!
warn qq(Could not download file "$file");
my $remote_files_ref;
if ( $remote_files_ref = $sftp->ls() ) { # Check whether succeeded or failed!
warn qq(Cannot get stat or remote directory.);
}
else {
say qq(List of files in "$remote_dir":);
for my $remote_file ( #{ $remote_files_ref } ) {
say " $remote_file";
}
}
}
}
I'll bet that the problem is that you have some full paths to files in your file_list.txt and that the paths to the files do not exist on the FTP server. Remember that FTP doesn't create directories for you, so if you have
/etc/passwd
in your file_list.txt, you better have a directory called
/etc
on your FTP server.
The problem was ,the code was being run on windows with cygwin so one of the major errors were there was thread competition to the single VTY resource and the threads used to get TIMED OUT, So i resolved it using an expect script
I have a problem when I use Apache::DBI in child processes. The problem is that Apache::DBI provides a single handle for all processes which use it, so I get
DBD::mysql::db selectall_arrayref
failed: Commands out of sync; you
can't run this command now at
/usr/local/www/apache22/data/test-fork.cgi
line 20.
Reconnection doesn't help, since Apache::DBI reconnects in all processes, as I understood the following error
The server encountered an internal
error and was unable to complete your
request.
Error message: DBD driver has not
implemented the AutoCommit attribute
at
/usr/local/lib/perl5/site_perl/5.8.9/Apache/DBI.pm
line 283. ,
Here's the origin code:
use Data::Dumper 'Dumper';
use DBI ();
my $dbh = DBI->connect($dsn, $username, $password, {
RaiseError => 1,
PrintError => 0,
});
my $file = "/tmp/test-fork.tmp";
my $pid = fork;
defined $pid or die "fork: $!";
if ($pid) {
my $rows = eval { $dbh->selectall_arrayref('SELECT SLEEP(1)') };
print "Content-Type: text/plain\n\n";
print $rows ? "parent: " . Dumper($rows) : $#;
}
else {
my $rows = eval { $dbh->selectall_arrayref('SELECT SLEEP(1)') };
open FH, '>', $file or die "$file: $!";
print FH $rows ? "child: " . Dumper($rows) : $#;
close FH;
}
The code I used for reconnection:
...
else {
$dbh->disconnect;
$dbh = DBI->connect($dsn, $username, $password, $attrs);
my $rows = eval { $dbh->selectall_arrayref('SELECT SLEEP(1)') };
open FH, '>', $file or die "$file: $!";
print FH $rows ? "child: " . Dumper($rows) : $#;
close FH;
}
Is there a safe way to use Apache::DBI with forking? Is there a way to make it create a new connection perhaps?
I see a few options:
Explicitly close your DB handles when you fork, and reopen them as needed.
e.g.:
my $dbh = DBI->connect(...);
my $pid = fork;
defined $pid or die "fork: $!";
if ($pid) {
# parent...
}
else {
# child...
undef $dbh;
This could be made easier by storing the $dbh in an object, and passing around that object as needed to parts of your system. The object would be responsible for reopening the $dbh as needed, so the rest of the application doesn't have to concern itself with the details. Keep code encapsulated and well-separated from other parts of the system.
Don't use Apache::DBI. I can highly recommend DBIx::Connector, which opens a new connection as needed and doesn't preserve the bad behaviour of either plain DBI or Apache::DBI: see http://search.cpan.org/~dwheeler/DBIx-Connector-0.32/lib/DBIx/Connector.pm#Description for a detailed description of how it differs.
I use DBIx::Connector in my system inside a Moose object, which uses a method delegation to provide the dbh. The application simply does:
my $dbh = $db_dbj->dbh;
my $sth = $dbh->prepare(...);
# more boring DBI code here
...And the dbh is reconnected/regenerated as needed, invisibly.
As an aside, you should be really careful of using bare filehandles in a multiprocess environment. You could be very easily clobbering your data. open (my $fh, $file) or die "Cannot open $file: $!" is much safer.
I'm also a little nervous by seeing you using eval {} blocks without checking the contents of $#. You're just masking errors, rather than dealing with them, so there may be more things going on than you are aware of. Check your result values (or better, use an explicit exception-handling module, such as Try::Tiny. use use strict; use warnings;.
PS. I just noticed that you are explicitly including DBI in your code. Don't do that. If you use Apache::DBI in your startup_modperl.pl (or whatever you call your bootstrap file), you should never have to include DBI itself. I can't say for sure but I wouldn't be confident the right package is getting called (it's been a while since I looked at Apache::DBI's guts; it might take care of this for you though).
Don't fork under mod_perl2. Use Apache2::Subprocess. See also Is it a bad idea to fork under mod_perl2?
You are given either an IO::File object or a typeglob (\*STDOUT or Symbol::symbol_to_ref("main::FH")); how would you go about determining if it is a read or write handle? The interface cannot be extended to pass this information (I am overriding close to add calls to flush and sync before the actual close).
Currently I am attempting to flush and sync the filehandle and ignoring the error "Invalid argument" (which is what I get when I attempt to flush or sync a read filehandle):
eval { $fh->flush; 1 } or do {
#this seems to exclude flushes on read handles
unless ($! =~ /Invalid argument/) {
croak "could not flush $fh: $!";
}
};
eval { $fh->sync; 1 } or do {
#this seems to exclude syncs on read handles
unless ($! =~ /Invalid argument/) {
croak "could not sync $fh: $!";
}
};
Have a look at the fcntl options. Maybe F_GETFL with O_ACCMODE.
Edit: I did a little googling and playing over lunch and here is some probably non-portable code but it works for my Linux box, and probably any Posix system (perhaps even Cygwin, who knows?).
use strict;
use Fcntl;
use IO::File;
my $file;
my %modes = ( 0 => 'Read only', 1 => 'Write only', 2 => 'Read / Write' );
sub open_type {
my $fh = shift;
my $mode = fcntl($fh, F_GETFL, 0);
print "File is: " . $modes{$mode & 3} . "\n";
}
print "out\n";
$file = new IO::File();
$file->open('> /tmp/out');
open_type($file);
print "\n";
print "in\n";
$file = new IO::File();
$file->open('< /etc/passwd');
open_type($file);
print "\n";
print "both\n";
$file = new IO::File();
$file->open('+< /tmp/out');
open_type($file);
Example output:
$ perl test.pl
out
File is: Write only
in
File is: Read only
both
File is: Read / Write