LWP::UserAgent get callback with timeout - perl

Is the chunk lost with this code if the time is out (2 seconds), or does get retry to download the missed chunk?
use LWP::UserAgent;
my $url = '...';
my $file_name = '...';
my $ua = LWP::UserAgent->new();
open my $fh, '>>:raw', $file_name or die $!;
my $res = $ua->get(
$url,
':content_cb' => sub {
my ( $chunk, $res, $proto ) = #_;
eval {
local $SIG{ALRM} = sub { die "time out\n" };
alarm 2;
print $fh $chunk;
alarm 0;
};
# ...
},
);
close $fh;

If the 'content_cb' callback is called for a chunk, then that means the chunk has been successfully returned from the request. The LWP::UserAgent layer has done its job at that point (with respect to that chunk). Your program is then responsible for doing whatever with the chunk. LWP::UserAgent has no idea about how your program sets or handles system signals, so it can't possibly redo any request, or re-notify your program of any chunk, in response to a system signal or any other event that goes on in the context of your program (and that is outside the context of LPW::UserAgent).
Furthermore, it should be mentioned that even if you set the LWP::UserAgent timeout property, which applies to pending server activity (such as responding to the request at all, or sending the next chunk), then LWP::UserAgent would not even resend the request in the case of such a timeout. The module has simply not been designed to do that under any circumstances:
The requests is aborted if no activity on the connection to the server is observed for timeout seconds.
You can always resend the request in your code if any kind of timeout occurs, or if your code deems that it has not received sufficient response data from the server.

If you are worried about timeouts, use the timeout method.
In your code, when a chunk of data arrives, LWP::UserAgent calls your ':content_cb' callback, and no retry is done.
IMO it has no sense to handle timeout in that way because it will never happens (unless print $fh $chunk; takes its time).

Related

Detecting if internet is connected in perl

I have this perl script to extract the source code of a webpage:
#!/usr/bin/perl
use LWP::UserAgent;
my $ou = new LWP::UserAgent;
my $url = "http://google.com";
my $source = $ou->get("$url")->decoded_content;
print "$source\n";
Now, I want to check the internet status if it is connected or not before extracting the source code .
The simplest way to detect whether a remote server is off line is to attempt to connect to it. Using LWP to send a head request (instead of get) retrieves just the HTTP header information without any content, and you should get a swift response from any server that is on line
The default timeout of LWP::UserAgent object is three minutes, so you will need to set it to something much shorter for a rapid test
This program temporarily sets the timeout to 0.5 seconds, sends a head request, and reports that the server is not responding if the result is an error of any sort. The original timeout value is restored before carrying on
Depending on the real server that you want to test, you will need to adjust the timeout carefully to avoid getting false negatives
use strict;
use warnings 'all';
use constant URL => 'http://www.google.com/';
use LWP;
my $ua = LWP::UserAgent->new;
{
my $to = $ua->timeout(0.5);
my $res = $ua->head(URL);
unless ( $res->is_success ) {
die sprintf "%s is not responding (%s)\n", URL, $res->status_line;
}
$ua->timeout($to);
}

Perl Net::SSLeay check if socket is available to read

I am using a perl module called Net::APNS::Persistent. It helps me to open up a persistent connection with apple's apns server and send push notifications through APNS. This module uses Net::SSLeay for ssl communication with APNS server.
Now, I want to read from my socket periodically to check if APNS sends back any response. Net::APNS::Persistent already has a function called _read() which looks like below:
sub _read {
my $self = shift;
my ($socket, $ctx, $ssl) = #{$self->_connection};
my $data = Net::SSLeay::ssl_read_all( $ssl );
die_if_ssl_error("error reading from ssl connection: $!");
return $data;
}
However, this function works only after APNS drops the connection and I get error while trying to write. On other times my script gets stuck at,
my $data = Net::SSLeay::ssl_read_all( $ssl );
I checked Net::SSLeay doc and found it has a method called peek
Copies $max bytes from the specified $ssl into the returned value. In contrast to the Net::SSLeay::read() function, the data in the SSL buffer is unmodified after the SSL_peek() operation.
I though it might be useful, so I added another function within the Net::APNS::Persistent module:
sub ssl_peek {
my $self = shift;
my ($socket, $ctx, $ssl) = #{$self->_connection};
print "Peeking \n";
my $data = Net::SSLeay::peek( $ssl, $pending );
print "Done peeking \n";
return $data;
}
Unfortunately this also gave me the same problem. It only prints Peeking and never reaches the line where it would print Done peeking. Had same problem using Net::SSLeay::read. Is there a way to check if the socket can be read or maybe set a read timeout so that my script doesnt get stuck while trying to read from socket?
The APNS documentation says the following:
If you send a notification that is accepted by APNs, nothing is returned.
If you send a notification that is malformed or otherwise unintelligible, APNs returns an error-response packet and closes the connection. Any notifications that you sent after the malformed notification using the same connection are discarded, and must be resent
As long as your notifications as accepted, there won't be any data to read and thus a read operation on the socket will block. The only time there's data available is when there's an error, and then the connection is immediately closed. That should explain the behaviour you're observing.
To check if the underlying socket can be read use select, i.e.
IO::Select->new(fileno($socket))->can_read(timeout);
timeout can be 0 to just check and not wait, can be a number of seconds or can be undef to wait forever. But before you do the select check if data are still available in the SSL buffer:
if (Net::SSLeay::pending($ssl)) { ... use SSL_peek or SSL_read ... }
Apart from that it does look like that the module you use does not even attempt to validate the servers certificate :(

Basic network chat app in Perl

I am trying to write a basic network chat app in Perl for learning purposes. I currently have a server and client program that function almost as I want them to. Multiple clients can connect to the server and send messages to and from it. However, I'm not really sure how to go about sending messages from one client to another and would appreciate a push in the right direction here. Here is the code I have so far, thoughts?
Note: This is my first ever attempt at using networking or using Perl for a proper project so any other guidance on how it's written would also be appreciated.
chat_server.pl
#!/usr/bin/perl -w
# chat_server.pl
use strict;
use IO::Socket::INET;
my $port = shift or die "Port required!\n";
my $socket = IO::Socket::INET->new(
LocalPort => $port,
Proto => 'tcp',
Listen => SOMAXCONN
) or die "Can't create socket: $!!\n";
my $child;
print "Listening for clients on $port...\n";
REQUEST:
while(my $client = $socket->accept) {
my $addr = gethostbyaddr($client->peeraddr, AF_INET);
my $port = $client->peerport;
if($child = fork) {
print "New connection from $addr:$port\n";
close $client;
next REQUEST;
} die "fork failed!\n" unless defined $child;
while (<$client>) {
print "[$addr:$port] says: $_";
print $client "[$addr:$port] says: $_";
}
}
close $socket;
chat_client.pl
#!/usr/bin/perl -w
# chat_client.pl
use strict;
use IO::Socket::INET;
my $port = shift or die "No port\n";
my $server = shift or die "No server\n";
my $client_socket = IO::Socket::INET->new(
PeerPort => $port,
PeerAddr => $server,
Proto => 'tcp'
) or die "Can't create send socket: $!!\n";
my $child;
if($child = fork) {
while(1) {
sleep(1);
print scalar <$client_socket>;
}
}
die "fork failed!\n" unless defined $child;
print "Connected to $server:$port!\n";
do {
print "> ";
print $client_socket $_ if defined $_;
} while(<STDIN>);
print "Closing connection";
close $client_socket;
A single client to a single server isn't too difficult - what you're doing with your code there is - effectively - creating a 1 to 1 relationship. Your forked server is talking exclusively to your client.
To get information to propagate (via the server) between multiple clients, you're going to have to get a bit more complicated - because you have separate processes, these processes now need to communicate with each other. This is a big enough question that there's a whole segment of the perl documentation about it: perlipc.
This is actually going to increase the complexity of your code substantially, because you're moving to a 1-to-many relationship on your communications, and they'll all be happening asynchronously.
Socket based communication is one form of inter-process communication (IPC) and you're already doing that. But your 'gotcha' here is that you're moving from 1 to 1 comms to 1 to many comms. You need to be able to broadcast, and this mode of communications doesn't support that particularly well.
What I would suggest is look at IO::Pipe - I've some example code here: How to extract data from Parallel::ForkManager in perl
Then use IO::Select and can_read to asynchronously decide if there's any data coming in on the pipe. You'll probably need an array of pipes - one per client - otherwise you might get concurrent stuff overlapping.
E.g.:
(From IO::Pipe doc page:
my $pipe = IO::Pipe->new();
if($pid = fork()) { # Parent
$pipe->reader();
while(<$pipe>) {
...
}
}
elsif(defined $pid) { # Child
$pipe->writer();
print $pipe ...
}
Unfortunately there's a slight gotcha here - your pipes will be created by the forking process, but that in turn means you'll need to figure out how to handle an array of pipe and checking to see if they're readable.
That means you can't sit in a while loop around accepting sockets any more - that blocks, so you'd have messages queued until another client connects (which is really not going to be what you want). So you'll also need to use select again to check whether there's something ready to accept first.

Cancel Download using WWW::Mechanize in Perl

I have written a Perl script which would check a list of URLs and connect to them by sending a GET request.
Now, let's say that one of these URLs has a file which is very big in size, for instance, has a size > 100 MB.
When a request is sent to download this file using this:
$mech=WWW::Mechanize->new();
$url="http://somewebsitename.com/very_big_file.txt"
$mech->get($url)
Once the GET request is sent, it will start downloading the file. I want this to be cancelled using WWW::Mechanize. How can I do that?
I checked the documentation of this Perl Module here:
http://metacpan.org/pod/WWW::Mechanize
However, I could not find a method which would help me do this.
Thanks.
Aborting a GET request
Using the :content_cb option, you can provide a callback function to get() that will be executed for each chunk of response content received from the server. You can set* the chunk size (in bytes) using the :read_size_hint option. These options are documented in LWP::UserAgent (get() in WWW::Mechanize is just an overloaded version of the same method in LWP::UserAgent).
The following request will be aborted after reading 1024 bytes of response content:
use WWW::Mechanize;
sub callback {
my ($data, $response, $protocol) = #_;
die "Too much data";
}
my $mech = WWW::Mechanize->new;
my $url = 'http://www.example.com';
$mech->get($url, ':content_cb' => \&callback, ':read_size_hint' => 1024);
print $mech->response()->header('X-Died');
Output:
Too much data at ./mechanize line 12.
Note that the die in the callback does not cause the program itself to die; it simply sets the X-Died header in the response object. You can add the appropriate logic to your callback to determine under what conditions a request should be aborted.
Don't even fetch URL if content is too large
Based on your comments, it sounds like what you really want is to never send a request in the first place if the content is too large. This is quite different from aborting a GET request midway through, since you can fetch the Content-Length header with a HEAD request and perform different actions based on the value:
my #urls = qw(http://www.example.com http://www.google.com);
foreach my $url (#urls) {
$mech->head($url);
if ($mech->success) {
my $length = $mech->response()->header('Content-Length') // 0;
next if $length > 1024;
$mech->get($url);
}
}
Note that according to the HTTP spec, applications should set the Content-Length header. This does not mean that they will (hence the default value of 0 in my code example).
* According to the documentation, the "protocol module which will try to read data from the server in chunks of this size," but I don't think it's guaranteed.

IO::Socket timing out when getting response

I'm trying to connect to a web service using IO::Socket::INET (yes, I know that there are lots of better modules for doing this, but I don't have them and can't add them, so please don't suggest it), but I'm timing out (I think that's what it's doing) waiting for a response.
Here's the basic crux of my code (I previously populate the content with all the proper headers, and set it up, etc):
$httpSock->print($content);
my #lines = $httpSock->getlines();
foreach my $line ( #lines ) {
print $line;
}
It appears that my request is made immediately, then it waits about 2 minutes before spitting back the response. If I alter the code to use a raw socket recv instead of getlines(), ala:
$httpSock->recv($data, 1024);
I get the response immediately (although only the first 1024 chars). Am I doing something wrong here? I'm using a late enough version of IO::Socket that autoflush should be enabled by default, but turning it on explicitly didn't seem to make any difference. I could probably also just keep reading from the socket until I got the entire response, but that's definitely messier than using getlines() or <$httpSock>.
Thanks in advance.
I'm having an issue re-creating the problem with the code snippet you've posted. Here's the code I tested with:
use strict;
use warnings;
use IO::Socket;
my $httpSock = new IO::Socket::INET(
PeerAddr => 'www.google.com',
PeerPort => '80',
Proto => 'tcp',
);
my $content = "HEAD / HTTP/1.0\r\nHost: www.google.com\r\n\r\n";
$httpSock->print($content);
my #lines = $httpSock->getlines();
foreach my $line (#lines) {
print $line;
}
Here are the results:
$ time ./1.pl
HTTP/1.0 200 OK
-snip-
real 0m0.084s
user 0m0.025s
sys 0m0.007s
The problem is that getlines() waits until the connection is closed. If the web service you are connecting to doesn't close your connection, the getlines function will wait, thinking more data is on the way. When your connection times out after those 2 minutes or so, getlines is seeing the connection close, and returning the lines it received to you. Recv on the other hand will grab everything up to the predetermined limit that is on the connection at that time and return it to the buffer you hand it immediately, but it will wait until it gets some data if there is none currently. I know you think its messy, but this might work out for you:
$httpSock->recv($buf, 1024);
$message = "";
while (length($buf) > 0) {
$message .= $buf;
$httpSock->recv($buf, 1024, MSG_DONTWAIT);
}
print $message;
The MSG_DONTWAIT will cause recv to not wait for a message if the connection is empty. You can also increase 1024 to some big number to decrease the number of loops, or even possibly even get the whole message at once.
This should also let you keep the sockets open for further use until you close it yourself.
I am wondering if the google example works because google.com is closing the connection after it responds.