Perl WWW::Mechanize methods not working in AIX - perl

I have a simple requirement of screen scraping a web-page (simple URL based reports) and direct the HTML response to an output file. The URL will however redirect to an authentication (HTTPS Login) page with "form based" authentication (no javascript) and upon authentication the report I am trying to view should show up in the $response (as HTML). Interestingly, my code is working just fine in a Windows machine, however the same code below is not working in AIX machine and it looks like the click_button() function call does nothing. I have tried click(), submit(), but none is working so instead of getting the actual report all I get is the logon screen in the HTML output file. Any ideas, what can be wrong?
use WWW::Mechanize;
use strict;
my $username = "admin";
my $password = "welcome1";
my $outpath = "/home/data/output";
my $fromday = 7;
my $url = "https://www.myreports.com/tax_report.php";
my $name = "tax_report";
my $outfile = "$outpath/$name.html";
my $mech = WWW::Mechanize->new(noproxy =>'0');
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
$year += 1900;
$mon++; # since it will start from 0
$mday--; # yesterdays date (to day)
$fromday = $mday - $days; #(from day)
#Create URL extension for generating report with previous date
my $dt_range = "?Y&dc_date1=$mon%2F$fromday%2F$year&dc_date2=$mon%2F$mday%2F$year";
my $url = $url . $dt_range;
$mech->get($url);
$mech->field(login => "$username");
$mech->field(passwd => "$password");
$mech->add_handler("request_send", sub { shift->dump; return });
$mech->add_handler("response_done", sub { shift->dump; return });
$mech->click_button(value=>"Login now");
my $response = $mech->content();
print "Generating report: $name...\n";
open (OUT, ">>$outfile")|| die "Cannot create report file $outfile";
print OUT "$response";
close OUT;
The WWW::Mechanize version in both the Machines are same i.e. 1.54 but the Win machine perl version is 5.10.1 whereas the AIX machine's Perl version is 5.8.8.
Other Alternatives Used -
my $inputobject=$mech->current_form()->find_input( undef,'submit' );
print $inputobject->value . "\n";
$mech->click_button(input => $inputobject);
print $mech->status() . "\n";
The $inputobject shows the correct button element as in the HTML source and the second print returns a status of 200 which apparently stands for OK. But its still not working in the AIX machine.
UPDATE- It seems that the site I am trying to connect to has an un-trusted SSL certificate. I tried the program on three different machines Windows PC, Mac and AIX. On the Windows Machine the program works and I was able to login to the website through the browsers (Chrome, Firefox,IE). However in Mac, the login just won't work (through the browsers) and it shows an un-trusted certificate error (or warning!) this probably means the proxy settings are not set up, the Perl program won't work either. And lastly the AIX where the Perl is not working as well. Not sure how to bypass this un-trusted SSL certificate issue here. Any help will be appreciated.
UPDATE2: Included below lines of code in the script to see logging details and found that I was being re-directed (HTTP 302) since my IP was filtered by the server Firewall. Once the AIX ip was added to the server's firewall exception the script worked perfectly. The two lines below were the life saver-
$mech->add_handler("request_send", sub { shift->dump; return });
$mech->add_handler("response_done", sub { shift->dump; return });

Can you use the following line before my $mech = WWW::Mechanize->new(noproxy =>'0'); of your perl code and try again ?
$ENV{PERL_LWP_SSL_VERIFY_HOSTNAME}=0;

Related

How to Retrive multiple sequences from a database in Bioperl?

I have installed Bioperl1.6 via ppm and have placed .pl file in cgi-bin folder of my localhost. When I run this via url
http://localhost/cgi-bin/bio2.pl
it says "Internal Server Error"
While if run any of the bioperl file either in CGI or .pl it complain same as Internal server error.
#!C:/Perl64/bin/perl.exe
use Bio::SeqIO;
use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;
$query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]";
$query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => $query );
$gb_obj = Bio::DB::GenBank->new;
$stream_obj = $gb_obj->get_Stream_by_query($query_obj);
while ($seq_obj = $stream_obj->next_seq) {
# do something with the sequence object
print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
}
if I run this code via cmd then it says genebank cant be found excptn.
Please guide me how to run bioperl in perl cgi.

perl script to access sec edgar master files returns file not found when file exists on ftp server

It would be great if someone can help. I really am stuck.
I am downloading the master files from SEC edgar and I got the script from—http://brage.bibsys.no/bi/bitstream/URN:NBN:no-bibsys_brage_38213/1/Norli_SRFE_2012.pdf (page 14..published now)
I get the error 404 master.gz not found
While debugging i made it paste the url and when i use the same in browser I can download the file. It is parsing the url correctly till QTR1 but after that it is not able to find the file when it actually exists ..please help.
1) for debugging reasons now I changed the code to 1995 (but later plan to add years 1995 to 2012)
2) It did not work for any file. When I said QTR1 abovr - I meant that the same code without the file name (just for testing ) -- ....full-index/1995/QTR1/ (without the file name) returns a status code OK but ...ftp.sec.gov/edgar/full-index/1995/QTR1/master.gz returns 404 file not found error. It does not work for any quarter.
I wasted so much time on this seemingly simple thing which is supposed to work but it is just not working.... could you copy past this and run..is it working for you?
The code below gets the master files from QTR folders. Pasting my code ::
—————-
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(250);
$ua->env_proxy;
for($year=1995; $year<1996; $year=$year+1){
for($i=1; $i<5; $i=$i+1){
$quarter = “QTR” . $i;
$filegrag = “ftp://ftp.sec.gov/edgar/full-index/” . $year . “/” . $quarter . “/master.gz”;
print $filegrag;
# This command gets the file from EDGAR
my $response = $ua->get($filegrag);
print $response;
print $response->status_line;
# Now just pipe the output to a file named appropriately
$filename = $year . $quarter . “master”;
open(MYOUTFILE, “>” . $filename);
if ($response->is_success) {
print MYOUTFILE $response->decoded_content;
}
else {
die $response->status_line;
}
close(MYOUTFILE);
}
}
I realized that there were some firewall issues that were causing the problem I had. Now things are fine.

How to access a simple SOAP Service in Perl

I am currently dabbling around with perl and SOAP, using SOAP::Lite.
I have a simple SOAP server that appears to run fine:
#!perl -w
use SOAP::Transport::HTTP;
use Demo;
# don't want to die on 'Broken pipe' or Ctrl-C
$SIG{PIPE} = $SIG{INT} = 'IGNORE';
my $daemon = SOAP::Transport::HTTP::Daemon
-> new (LocalPort => 801)
-> dispatch_to('/home/soaplite/modules')
;
print "Contact to SOAP server at ", $daemon->url, "\n";
$daemon->handle;
It includes a small class called Demo, which simply retrieves the systems total memory:
Demo.py
#!/usr/bin/perl
use Sys::MemInfo qw(totalmem freemem totalswap);
print "total memory: ".(&totalmem / 1024)."\n";
I have an example of a SOAP client below written in PERL, although I am unsure how to communicate with the server (since the tutorial I am following here goes of on a tangent e.g. retrieve the result of the Demo.py class from the client:
#!perl -w
use SOAP::Lite;
# Frontier http://www.userland.com/
$s = SOAP::Lite
-> uri('/examples')
-> on_action(sub { sprintf '"%s"', shift })
-> proxy('http://superhonker.userland.com/')
;
print $s->getStateName(SOAP::Data->name(statenum => 25))->result;
Any help would be greatly appreciated :)
For the server script, the dispatch_to method takes the path to the package to load, and the name of the package itself. If you pass a third parameter, it will limit the names of the methods made visible by the server. (e.g. 2 methods named memory and time, passing Demo::time as the 3rd param will make memory invisible to the client service.)
File server.pl
my $daemon = SOAP::Transport::HTTP::Daemon
-> new (LocalPort => 801)
-> dispatch_to('/home/soaplite/modules', 'Demo')
;
Your Demo package should be a package with methods that return the values. I couldn't get Sys::MemInfo compiled on my system, so I just used localtime instead. I'm not sure why you named your package Demo.py, but Perl packages must have the extension pm, otherwise they won't be properly loaded.
File Demo.pm
#!/usr/bin/perl
package Demo;
#use Sys::MemInfo qw(totalmem freemem totalswap);
sub memory {
#print "total memory: ".(&totalmem / 1024)."\n";
return "Can't load Sys::MemInfo, sorry";
}
sub time {
my $time = localtime;
return $time;
}
1;
For the client code, there's 2 important pieces that must be properly specified to work, the proxy and the uri. The proxy is the url path to the soap web service. Since you are running the server script as a daemon process, your path is just the web site's url. My computer doesn't have a url, so I used http://localhost:801/. The 801 is the port you specified above. If you were running as a cgi script inside of a different web server (such as Apache), then you would need to specify the cgi script to call (e.g. http://localhost/cgi-bin/server.pl, changing the package in server.pl to SOAP::Transport::HTTP::CGI.
uri is probably the most confusing, but it's the namespace of the xml files returned by the web service. Turn on +trace => 'debug' to see the xml file returned by the web service. The uri should just be the name of the server. Even if you switch ports or to a cgi dispatch method, this uri stays the same.
File test.pl
#!perl -w
use SOAP::Lite +trace => 'debug';
# Frontier http://www.userland.com/
$s = SOAP::Lite->new(proxy => 'http://superhonker.userland.com:801/',
uri => 'http://superhonker.userland.com/');
#might be http://www.userland.com/
#but I could not test sub-domains
print $s->time()->result;
I'll recycle these two answers for tips:
Client of web service in Perl
Remote function call using SOAP::Lite

Why does my Perl CGI program fail with "Software error: ..."?

When I try to run my Perl CGI program, the returned web page tells me:
Software error: For help, please send mail to the webmaster (root#localhost), giving this error message and the time and date of the error.
Here is my code in one of the file:
#!/usr/bin/perl
use lib "/home/ecoopr/ecoopr.com/CPAN";
use CGI;
use CGI::FormBuilder;
use CGI::Session;
use CGI::Carp (fatalsToBrowser);
use CGI::Session;
use HTML::Template;
use MIME::Base64 ();
use strict;
require "./db_lib.pl";
require "./config.pl";
my $query = CGI->new;
my $url = $query->url();
my $hostname = $query->url(-base => 1);
my $login_url = $hostname . '/login.pl';
my $redir_url = $login_url . '?d=' . $url;
my $domain_name = get_domain_name();
my $helpful_msg = $query->param('m');
my $new_trusted_user_fname = $query->param('u');
my $action = $query->param('a');
$new_trusted_user_fname = MIME::Base64::decode($new_trusted_user_fname);
####### Colin: Added July 12, 2009 #######
my $view = $query->param('view');
my $offset = $query->param('offset');
####### Colin: Added July , 2009 #######
#print $session->header;
#print $new_trusted_user;
my $helpful_msg_txt = qq[];
my $helpful_msg_div = qq[];
if ($helpful_msg)
The "please send mail to the webmaster" message you see is a generic message that the web server gives you when anything goes wrong and nothing handles it. It's not at all interesting in terms of solving the actual problem. Check the error log to find possible relevant error output from your program.
And, go through my How do I troubleshoot my Perl CGI script? advice on finding the problem.
My guess is that you have a syntax error with that dangling if(). What you have posted isn't a valid Perl program.
Good luck,
is that something related to suexec module
Improper configuration of suExec can cause permission errors
The suEXEC feature provides Apache users the ability to run CGI and SSI programs under user IDs different from the user ID of the calling web server. Normally, when a CGI or SSI program executes, it runs as the same user who is running the web server.
apache recommends that you not consider using suEXEC.
http://httpd.apache.org/docs/2.2/suexec.html
From the StackOverflow page: How to trap program crashes with HTTP error code 500
I see that your include: use CGI::Carp (fatalsToBrowser);
... stifles the HTTP 500 error. Simply removing this will allow the programs to crash "properly".

Why can't I connect to my CAS server with Perl's AuthCAS?

I'm attempting to use an existing CAS server to authenticate login for a Perl CGI web script and am using the AuthCAS Perl module (v 1.3.1). I can connect to the CAS server to get the service ticket but when I try to connect to validate the ticket my script returns with the following error from the IO::Socket::SSL module:
500 Can't connect to [CAS Server]:443 (Bad hostname '[CAS Server]')
([CAS Server] substituted for real server name)
Symptoms/Tests:
If I type the generated URL for the authentication into the web browser's location bar it returns just fine with the expected XML snippet. So it is not a bad host name.
If I generate a script without using the AuthCAS module but using the IO::Socket::SSL module directly to query the CAS server for validation on the generated service ticket the Perl script will run fine from the command line but not in the browser.
If I add the AuthCAS module into the script in item 2, the script no longer works on the command line and still doesn't work in the browser.
Here is the bare-bones script that produces the error:
#!/usr/bin/perl
use strict;
use warnings;
use CGI;
use AuthCAS;
use CGI::Carp qw( fatalsToBrowser );
my $id = $ENV{QUERY_STRING};
my $q = new CGI;
my $target = "http://localhost/cgi-bin/testCAS.cgi";
my $cas = new AuthCAS(casUrl => 'https://cas_server/cas');
if ($id eq ""){
my $login_url = $cas->getServerLoginURL($target);
printf "Location: $login_url\n\n";
exit 0;
} else {
print $q->header();
print "CAS TEST<br>\n";
## When coming back from the CAS server a ticket is provided in the QUERY_STRING
print "QUERY_STRING = " . $id . "</br>\n";
## $ST should contain the received Service Ticket
my $ST = $q->param('ticket');
my $user = $cas->validateST($target, $ST); #### This is what fails
printf "Error: %s\n", &AuthCAS::get_errors() unless (defined $user);
}
Any ideas on where the conflict might be?
The error is coming from the line directly above the snippet Cebjyre quoted namely
$ssl_socket = new IO::Socket::SSL(%ssl_options);
namely the socket creation. All of the input parameters are correct. I had edited the module to put in debug statements and print out all the parameters just before that call and they are all fine. Looks like I'm going to have to dive deeper into the IO::Socket::SSL module.
As usually happens when I post questions like this, I found the problem. It turns out the Crypt::SSLeay module was not installed or at least not up to date. Of course the error messages didn't give me any clues. Updating it and all the problems go away and things are working fine now.
Well, from the module source it looks like that IO::Socket error is coming from get_https2
[...]
unless ($ssl_socket) {
$errors = sprintf "error %s unable to connect https://%s:%s/\n",&IO::Socket::SSL::errstr,$host,$port;
return undef;
}
[...]
which is called by callCAS, which is called by validateST.
One option is to temporarily edit the module file to put some debug statements in if you can, but if I had to guess, I'd say the casUrl you are supplying isn't matching up to the _parse_url regex properly - maybe you have three slashes after the https?