HTTP::Daemon and threads - perl

I have the following code on Windows XP and ActiveState ActivePerl 5.8.
What could be the problem with it? Why does it not work?
I tried to set it as a proxy to my IE but when I connect to some URLs from my IE nothing happens. The code enters the thread function and nothing happens.
use HTTP::Daemon;
use threads;
use HTTP::Status;
use LWP::UserAgent;
my $webServer;
my $d = HTTP::Daemon->new(
LocalAddr => '127.0.0.1',
LocalPort => 80,
Listen => 20
) || die;
print "Web Server started!\n";
print "Server Address: ", $d->sockhost(), "\n";
print "Server Port: ", $d->sockport(), "\n";
while (my $c = $d->accept) {
threads->create(\&process_one_req, $c)->detach();
}
sub process_one_req {
STDOUT->autoflush(1);
my $c = shift;
while (my $r = $c->get_request) {
if ($r->method eq "GET") {
print "Session info\n", $r->header('Host');
my $ua = LWP::UserAgent->new;
my $response = $ua->request($r);
$c->send_response($response);
} else {
$c->send_error(RC_FORBIDDEN);
}
}
$c->close;
undef($c);
}

I added the following line to the code before LWP::UserAgent->new and it seems to be working for me (in linux).
$r->uri("http://" . $r->header('Host') . "/" . $r->uri());
The uri that you got from the HTTP::Request object from the original request does not have the hostname. So added it to make it a absolute uri. Tested as follows:
$ curl -D - -o /dev/null -s -H 'Host: www.yahoo.com' http://localhost:8080/
HTTP/1.1 200 OK
Date: Thu, 27 Jan 2011 12:59:56 GMT
Server: libwww-perl-daemon/5.827
Cache-Control: private
Connection: close
Date: Thu, 27 Jan 2011 12:57:15 GMT
Age: 0
---snip--
UPDATE: Looks like I was completely wrong. I didnt need to make the change to URI object. Your original code worked for me as it is in Linux

If I recall correctly, this is because of the threading model in Windows where file handles are not passed between processes unless specifically asked for. This PerlMonks post seems to shed some light on the underlying problem, and may lead to an approach that works for you (I imagine you may be able to call the windows API on the file descriptor of of the client connection to allow access to it within the spawned thread).
Perl threads on Windows generally make my head hurt, while on UNIX-list systems I find them very easy to deal with. Then again, I imagine figuring out how to correctly use forked processes to emulate threads on a system that ONLY supports threads and not forking would make most people's head hurt.

Related

perl - Net::SSLeay and Server Name Indications

Hello my friendly stackoverflow users,
I have the following code that "should" print out the cert for 'cyclingnews.com'
#!/usr/bin/env perl
use strict;
use Net::SSLeay qw(get_https get_https3 make_headers);
my $site = "cyclingnews.com";
my ($p, $resp, $headers, $server_cert) = get_https3($site, 443, '/');
print "$headers\n";
my $PEM = Net::SSLeay::PEM_get_string_X509( $server_cert);
print $PEM, "\n";
Upon inspection of the .pem, I see that the cert belongs to:
Subject: CN = *.ssl.hwcdn.net
X509v3 Subject Alternative Name:
DNS:*.ssl.hwcdn.net, DNS:ssl.hwcdn.net
So what I understand, this looks like a problem with SNI where Net::SSLeay is not passing a SSL_hostname to 'cyclingnews.com'. With IO::SOCKETS::SSL this can be done with SSL_hostname ( https://metacpan.org/pod/IO::Socket::SSL#Basic-SSL-Client )
The Net::SSLeay doc ( https://metacpan.org/pod/Net::SSLeay#Basic-set-of-functions ) says "By default get_https() supplies Host (to make virtual hosting easy) and Accept (reportedly needed by IIS) headers."
I am not sure if this relates to get_https3() so I have also tried:
#!/usr/bin/env perl
use strict;
use Net::SSLeay qw(get_https get_https3 make_headers);
my $site = "cyclingnews.com";
my ($p, $resp, $headers, $server_cert) = get_https3($site, 443, '/',
make_headers( 'Host:' => $site));
#);
print "$headers\n";
my $PEM = Net::SSLeay::PEM_get_string_X509( $server_cert);
print $PEM, "\n";
and this looks to pass the Host header but still same unwanted result.
so I am a bit lost, I'm a noob, so I know the folk on stackoverflow have a reputation for being freindly, maybe y'all could give me some pointers
get_https3 like many similar functions ultimately ends up in https_cat where the SSL context setup and the SSL handshake are done. Unfortunately, setting the server_name extension (SNI) is not done in this really old part of the code, which comes from a time where SNI wasn't that essentially for using HTTPS as it is today.

How do I discover on what server the app.psgi process is running?

Is there a way to discover on what server app.psgi is running?
For example, I am looking for some idea for how to solve the next code fragment from app.psgi:
#app.psgi
use Modern::Perl;
use Plack::Builder;
my $app = sub { ... };
my $server = MyApp::GetServerType(); # <--- I need some idea for how to write this...
given($server) {
when (/plackup/) { ... do something ... };
when (/Starman/) { ... do something other ... };
default { die "Unknown" };
}
$app;
Checking the PLACK_ENV environment variable is not a solution...
Short answer, inspect the caller:
#app.psgi
# use Modern::Perl;
use feature qw(switch say);
use Carp qw(longmess);
use Plack::Builder;
my $app = sub {
return [ 200, [ 'Content-Type' => 'text/plain' ], [ 'Hello World' ] ];
};
# This hack gets what we need out of the call stack
my $stack = longmess("Stack:");
# say STDERR $stack;
given($stack) {
when (/plackup/) { say STDERR "Server: plackup" };
when (/Starman/) { say STDERR "Server: starman" };
default { die "Server: Unknown" };
}
return $app;
However, doing this in the app.psgi will make your code less portable. If you die on an unknown server people won't be able to run your code in an unknown location.
Also, be aware that this code may be run multiple times depending on how the server is implemented so any side effects may occur multiple times.
For example, here is the output for plackup:
plackup --app /usr/lusers/bburnett/dev/trunk/getserver.psgi
Server: plackup
HTTP::Server::PSGI: Accepting connections at http://0:5000/
So far so good. But here is the output for starman:
starman --app /usr/lusers/bburnett/dev/trunk/getserver.psgi
2014/02/21-16:09:46 Starman::Server (type Net::Server::PreFork) starting! pid(27365)
Resolved [*]:5000 to [0.0.0.0]:5000, IPv4
Binding to TCP port 5000 on host 0.0.0.0 with IPv4
Setting gid to "15 15 0 0 15 20920 20921 20927"
Server: starman
Server: starman
Server: starman
Server: starman
Server: starman
Here it gets run once for the master and once per child (defaults to four children).
If you really want something different to happen for these different servers a more robust way may be to subclass them yourself and put the code into each subclass passing -s My::Starman::Wrapper to plackup and starman as needed.
If you really want a switch statement and to put the code in one place, you could look into writing some code that calls Plack::Loader or Plack::Runner. Take a look at the source for plackup, and you'll see how it wraps Plack::Runner. Take a look at the source for Plack::Loader, and you'll see how it gets the backend to run and then loads the appropriate server class.

Perl WWW::Mechanize methods not working in AIX

I have a simple requirement of screen scraping a web-page (simple URL based reports) and direct the HTML response to an output file. The URL will however redirect to an authentication (HTTPS Login) page with "form based" authentication (no javascript) and upon authentication the report I am trying to view should show up in the $response (as HTML). Interestingly, my code is working just fine in a Windows machine, however the same code below is not working in AIX machine and it looks like the click_button() function call does nothing. I have tried click(), submit(), but none is working so instead of getting the actual report all I get is the logon screen in the HTML output file. Any ideas, what can be wrong?
use WWW::Mechanize;
use strict;
my $username = "admin";
my $password = "welcome1";
my $outpath = "/home/data/output";
my $fromday = 7;
my $url = "https://www.myreports.com/tax_report.php";
my $name = "tax_report";
my $outfile = "$outpath/$name.html";
my $mech = WWW::Mechanize->new(noproxy =>'0');
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
$year += 1900;
$mon++; # since it will start from 0
$mday--; # yesterdays date (to day)
$fromday = $mday - $days; #(from day)
#Create URL extension for generating report with previous date
my $dt_range = "?Y&dc_date1=$mon%2F$fromday%2F$year&dc_date2=$mon%2F$mday%2F$year";
my $url = $url . $dt_range;
$mech->get($url);
$mech->field(login => "$username");
$mech->field(passwd => "$password");
$mech->add_handler("request_send", sub { shift->dump; return });
$mech->add_handler("response_done", sub { shift->dump; return });
$mech->click_button(value=>"Login now");
my $response = $mech->content();
print "Generating report: $name...\n";
open (OUT, ">>$outfile")|| die "Cannot create report file $outfile";
print OUT "$response";
close OUT;
The WWW::Mechanize version in both the Machines are same i.e. 1.54 but the Win machine perl version is 5.10.1 whereas the AIX machine's Perl version is 5.8.8.
Other Alternatives Used -
my $inputobject=$mech->current_form()->find_input( undef,'submit' );
print $inputobject->value . "\n";
$mech->click_button(input => $inputobject);
print $mech->status() . "\n";
The $inputobject shows the correct button element as in the HTML source and the second print returns a status of 200 which apparently stands for OK. But its still not working in the AIX machine.
UPDATE- It seems that the site I am trying to connect to has an un-trusted SSL certificate. I tried the program on three different machines Windows PC, Mac and AIX. On the Windows Machine the program works and I was able to login to the website through the browsers (Chrome, Firefox,IE). However in Mac, the login just won't work (through the browsers) and it shows an un-trusted certificate error (or warning!) this probably means the proxy settings are not set up, the Perl program won't work either. And lastly the AIX where the Perl is not working as well. Not sure how to bypass this un-trusted SSL certificate issue here. Any help will be appreciated.
UPDATE2: Included below lines of code in the script to see logging details and found that I was being re-directed (HTTP 302) since my IP was filtered by the server Firewall. Once the AIX ip was added to the server's firewall exception the script worked perfectly. The two lines below were the life saver-
$mech->add_handler("request_send", sub { shift->dump; return });
$mech->add_handler("response_done", sub { shift->dump; return });
Can you use the following line before my $mech = WWW::Mechanize->new(noproxy =>'0'); of your perl code and try again ?
$ENV{PERL_LWP_SSL_VERIFY_HOSTNAME}=0;

Make timeout work for LWP::UserAgent HTTPS

Solution
As reported by #limulus in the answer I accepted, this was a bug in Net::HTTPS version 6.00. Always be wary of fresh .0 releases. Here's the relevant diff between the buggy and fixed version of that module:
D:\Opt\Perl512.32 :: diff lib\Net\HTTPS.pm site\lib\Net\HTTPS.pm
6c6
< $VERSION = "6.00";
---
> $VERSION = "6.02";
75,78c75,80
< # The underlying SSLeay classes fails to work if the socket is
< # placed in non-blocking mode. This override of the blocking
< # method makes sure it stays the way it was created.
< sub blocking { } # noop
---
> if ($SSL_SOCKET_CLASS eq "Net::SSL") {
> # The underlying SSLeay classes fails to work if the socket is
> # placed in non-blocking mode. This override of the blocking
> # method makes sure it stays the way it was created.
> *blocking = sub { };
> }
Original question
Relevance: It is annoying to see your HTTPS client block indefinitely because the connection endpoint is unreliable.
This experiment is easy to set up and replay at home. You just need two things, a tarpit to trap an incoming client, and a Perl script. The tarpit can be set up using netcat:
nc -k -l localhost 9999 # on Linux, for multiple requests
nc -l -p 9999 localhost # on Cygwin, for one request only
Then, point the script to this tarpit:
use strict;
use LWP::UserAgent;
use HTTP::Request::Common;
print 'LWP::UserAgent::VERSION ', $LWP::UserAgent::VERSION, "\n";
print 'IO::Socket::SSL::VERSION ', $IO::Socket::SSL::VERSION, "\n";
my $ua = LWP::UserAgent->new( timeout => 5, keep_alive => 1 );
$ua->ssl_opts( timeout => 5, Timeout => 5 ); # Yes - see note below!
my $rsp = $ua->request( GET 'https://localhost:9999' );
if ( $rsp->is_success ) {
print $rsp->as_string;
} else {
die $rsp->status_line;
}
What is this going to do? Well, connect to the port opened by NetCat, and then ... hang. Indefinitely. At least in terms of developer time. I mean it might time out after ten minutes or two hours, but I haven't checked; the specified timeout doesn't take effect, not on Linux, and not on Windows (Win32, haven't checked Cygwin).
Versions used:
LWP::UserAgent::VERSION 6.02
IO::Socket::SSL::VERSION 1.44
# on Linux
LWP::UserAgent::VERSION 6.02
IO::Socket::SSL::VERSION 1.44
# on Win32
Now for the timeout and Timeout parameters. The former is the name of the parameter for LWP::UA, the latter is the name for IO::Socket::SSL, used via LWP::Protocol::https. (Incidentally, why is metacpan HTTPS? Well, at least it's not a tarpit.) I am somehow hoping to have these parameters passed along :)
Just so you know, keep_alive doesn't have anything to do with the timeout not working, I verified that empirically. :)
Anyway, before digging deeper, does anyone know what's going on here and how to make the timeout work with HTTPS? Hard to believe I'm the first person running into this.
This is a result of the Net::HTTPS module overriding the blocking method of IO::Socket with a noop. Upgrading to the latest Net::HTTP package should fix this.
The timeout (and Timeout) options apply only to the connection -- how many seconds will LWP::UserAgent wait while connecting -- they are not for setting a timeout on the whole transaction.
You'll want to use Perl's alarm with a $SIG{ALRM} handler to timeout the whole transaction. See perldoc -f alarm or perlipc.
local $SIG{ALRM} = sub { die "SSL timeout\n" };
my $ua = LWP::UserAgent->new( timeout => 5, keep_alive => 1 );
$ua->ssl_opts( timeout => 5, Timeout => 5 );
eval {
alarm(10);
my $rsp = $ua->request( GET 'https://localhost:9999' );
if ( $rsp->is_success ) {
print $rsp->as_string;
} else {
die $rsp->status_line;
}
};
alarm(0);
if ($#) {
if ($# =~ /SSL timeout/) {
warn "request timed out";
} else {
die "error in request: $#";
}
}
(tested on Linux. Alarms can be a bit more cantankerous in Windows/Cygwin)
I asked this question on PerlMonks, and received an answer to the effect that:
The underlying IO::Socket::INET does not support non-blocking sockets
on Win32, thus non-blocking IO::Socket::SSL is not supported on Win32,
which means also, that timeouts don't work (because they are based on
non-blocking). See also http://www.perlmonks.org/?node_id=378675
http://cpansearch.perl.org/src/SULLR/IO-Socket-SSL-1.60/README.Win32
The PerlMonks post pointed to is from 2004. Not sure the information still applies; after all, I've seen the timeout does work on Windows, just not via SSL.

Why does my simple fastCGI Perl script fail?

I'm not of the Perl world, so some of this is new to me. I'm running Ubuntu Hardy LTS with apache2 and mod_fcgid packages installed. I'd like to get MT4 running under fcgid rather than mod-cgi (it seems to run OK with plain-old CGI).
I can't seem to get even a simple Perl script to run under fcgid. I created a simple "Hello World" app and included the code from this previous question to test if FCGI is running.
I named my script HelloWorld.fcgi (currently fcgid is set to handle .fcgi files only). Code:
#!/usr/bin/perl
use FCGI;
print "Content-type: text/html\n\n";
print "Hello world.\n\n";
my $request = FCGI::Request();
if ( $request->IsFastCGI ) {
print "we're running under FastCGI!\n";
} else {
print "plain old boring CGI\n";
}
When run from the command line, it prints "plain old boring..." When invoked via an http request to apache, I get a 500 Internal Server error and the output of the script is printed to the Apache error log:
Content-type: text/html
Hello world.
we're running under FastCGI!
[Wed Dec 03 22:26:19 2008] [warn] (104)Connection reset by peer: mod_fcgid: read data from fastcgi server error.
[Wed Dec 03 22:26:19 2008] [error] [client 70.23.221.171] Premature end of script headers: HelloWorld.fcgi
[Wed Dec 03 22:26:25 2008] [notice] mod_fcgid: process /www/mt/HelloWorld.fcgi(14189) exit(communication error), terminated by calling exit(), return code: 0
When I run the .cgi version of the same code, it works fine. Any idea why the output of the script is going to the error log? Apache config is the default mod_fcgid config plus, in a VirtualHost directive:
ServerName test1.example.com
DocumentRoot /www/example
<Directory /www/example>
AllowOverride None
AddHandler cgi-script .cgi
AddHandler fcgid-script .fcgi
Options +ExecCGI +Includes +FollowSymLinks
</Directory>
The problem is that the "Content-Type" header is sent outside of the request loop. You must print the "Content-Type" header for every request. If you move
print "Content-type: text/html\n\n";
to the top of the request loop it should fix the problem.
Also, you need to loop over the requests or you'll get no benefit, so following the first poster's example:
my $request = FCGI::Request();
while($request->Accept() >= 0) {
print("Content-type: text/html\n\n");
}
I use CGI::Fast more than FCGI, but the idea is the same, I think. The goal of fast cgi is to load the program once, and iterate in a loop for every request.
FCGI's man page says :
use FCGI;
my $count = 0;
my $request = FCGI::Request();
while($request->Accept() >= 0) {
print("Content-type: text/html\r\n\r\n", ++$count);
}
Which means, you have to Accept the request before being able to print anything back to the browser.
Movable Type uses CGI::Fast for FastCGI. The typical FastCGI script runs in a loop, as mat described. A loop that uses CGI::Fast would look like this:
#!/usr/bin/perl
use strict;
use CGI::Fast;
my $count = 0;
while (my $q = CGI::Fast->new) {
print("Content-Type: text/plain\n\n");
print("Process ID: $$; Count is: " . ++$count);
}
I tested this script on a server with the FCGI and CGI::Fast modules installed and count increments as you'd expect. If the process id changes, count will go back to 1 and then increment within that process. Each process has it's own variable space of course.
For MT, enabling FastCGI a matter of renaming (or symlinking) the cgi scripts to 'fcgi' (or making the handler for 'cgi' scripts fcgid, but that won't work for mt-xmlrpc.cgi which isn't FastCGI friendly yet). You'll also need to add some directives to your mt-config.cgi file so that it knows the new script names. Like this:
AdminScript mt.fcgi
CommentsScript mt-comments.fcgi
And so forth. More documentation specific to FastCGI and Movable Type is available on movabletype.org.
Anyway, based on your server's error logs, it looks like FCGI is working, and being invoked properly, but your script just isn't running in a loop, waiting for the next request to come along. So your test script did accomplish the task -- reporting whether FastCGI is configured or not. So now you should be able to reconfigure MT to use FastCGI.