Mojo::UserAgent non-blocking vs blocking performance - perl

I have the following code:
my $ua = Mojo::UserAgent->new ();
my #ids = qw(id1 id2 id3);
foreach (#ids) {
my $input = $_;
my $res = $ua->get('http://my_site/rest/id/'.$input.'.json' => sub {
my ($ua, $res) = #_;
print "$input =>" . $res->result->json('/net/id/desc'), "\n";
});
}
Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
Why when I run the above code (non-blocking) does it take about 6 seconds while when running the code as blocking, i.e. inside the loop something like:
my $res = $ua->get('http://my_site/rest/id/'.$input.'.json');
print "$input =>" . $res->result->json('/net/id/desc'), "\n";
without the latest line it takes about 1 second?
Why is the blocking code faster than the non-blocking code?

The first thing to check when things happened. I couldn't get the same delay. Remember to try each way several times to spot outliers where there's a network hiccup. Note that the second argument to the non-blocking sub is a transaction object, normally written as $tx, where the response object is normally written res:
use Mojo::Util qw(steady_time);
say "Begin: " . steady_time();
END { say "End: " . steady_time() }
my $ua = Mojo::UserAgent->new ();
my #ids = qw(id1 id2 id3);
foreach (#ids) {
my $input = $_;
my $res = $ua->get(
$url =>
sub {
my ($ua, $tx) = #_;
print "Fetched\n";
}
);
}
One possibility is that keep-alive is holding an open connection. What happens if you turn that off?
my $res = $ua->get(
$url =>
{ Connection => 'close' }
sub {
my ($ua, $tx) = #_;
print "Fetched\n";
}
);
Here's a version that uses promises, which you'll want to get used to as more Mojo stuff moves to it:
use feature qw(signatures);
no warnings qw(experimental::signatures);
use Mojo::Promise;
use Mojo::Util qw(steady_time);
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
say "Begin: " . steady_time();
END { say "End: " . steady_time() }
my #ids = qw(id1 id2 id3);
my #gets = map {
$ua->get_p( 'http://www.perl.com' )->then(
sub ( $tx ) { say "Fetched: " . steady_time() },
sub { print "Error: #_" }
);
} #ids;
Mojo::Promise->all( #gets )->wait;

Related

Sequentioal requests with Mojo::IOLoop::Delay

I need to get list of URLs in non-blocking mode, but not in parallel. It should be sequential requests one by one. How can I realize that?
I cannot find examples. Documentation and articles highlight only parallel execution.
Now my code looks like the following (simplified):
my $delay = Mojo::IOLoop::Delay->new;
$delay->steps(
sub {
build_report();
say "done";
}
);
sub parse_data {
...;
my $url = shift #urls;
my $end = $delay->begin;
$ua->get( $url => \&parse_data );
$end->();
}
my $end = $delay->begin;
$ua->get( $url => \&parse_data );
$end->();
$delay->wait;
I want to avoid multiple closures by using Mojo::IOLoop::Delay.
I have just started looking at Mojo::IOLoop myself for a project and I'm not au fait with it yet. But I think the simplest way is to build an array of closures and pass it to $delay->steps.
use strict;
use warnings 'all';
use feature 'say';
use Mojo;
use constant URL => 'http://stackoverflow.com/questions/tagged/perl';
STDOUT->autoflush;
my $ua = Mojo::UserAgent->new;
my $delay = Mojo::IOLoop::Delay->new;
my #urls = ( URL ) x 10;
my #steps = map {
my $url = $_;
sub {
my $end = $delay->begin;
$ua->get( $url => sub {
my ( $ua, $txn ) = #_;
$end->();
if ( my $err = $txn->error ) {
say $err->message;
}
else {
my $res = $txn->success;
say "#{$res}{qw/ code message /}";
}
});
}
} #urls;
$delay->steps( #steps, sub { say 'Done' } );
Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
output
200 OK
200 OK
200 OK
200 OK
200 OK
200 OK
200 OK
200 OK
200 OK
Done
200 OK
Note that Done is printed before the status line of the final get, because I have called $end->() immediately the callback arrives, assuming that any handling of the response doesn't need to be synchronised
If you don't want that then just move $end->() to the end of the callback. Then the delay will wait until the output has been generated before sending another request
Sequential requests very easy to implement.
Web server
#!/usr/bin/perl
use Mojo::Base -strict;
use Mojolicious::Lite;
get '/' => sub {
my $c = shift->render_later;
$c->delay(sub {
my $delay = shift;
$c->ua->get('https://google.ru' => $delay->begin);
}, sub {
my ($delay, $tx) = #_;
$tx->res->body; # body of the google.ru
$c->ua->get('https://twitter.com' => $delay->begin);
}, sub {
my ($delay, $tx) = #_;
$tx->res->body; # body of the twitter.com
$c->render(text => 'Done');
});
};
app->start;
Script
#!/usr/bin/perl
use Mojo::Base -strict;
use Mojo::IOLoop;
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
Mojo::IOLoop->delay(sub {
my $delay = shift;
$ua->get('https://www.google.ru' => $delay->begin);
}, sub {
my ($delay, $tx) = #_;
$tx->res->body; # body of the google.ru
$ua->get('https://twitter.com' => $delay->begin);
}, sub {
my ($delay, $tx) = #_;
$tx->res->body; # body of the twitter.com
warn 'DONE';
})->wait;
Script (dynamic requests)
Example here

AnyEvent::STOMP::Client + AnyEvent::ForkManger = Intermittent Error

I'm trying to write a process that listens to ActiveMQ and based on the message, goes out and grabs data from a webservice, does some processing and then puts the process data to another webservice. (REST/JSON)
The module below works fine until one of the wonky webservices I talk to returns an error. I've tried many things to catch the error but to no avail, yet. Once the webservice error happens though I get the following message:
unhandled callback exception on event (MESSAGE,
AnyEvent::STOMP::Client=HASH(0x3ad5e48), HASH(0x3a6bbb0)
{"action":"created","data":{"id":40578737,"type":"alert","who":null},"guid":"ADCCEE0C-73A7-11E6-8084-74B346D1CA67","hostname":"myserver","pid":48632}):
$fork_manager->start() should be called within the manager process
OK, I conceptually understand that child process is trying to start another process and that fork manager is saying that is a no no. But given the module below, what is the proper way to start a new process to handle the long running processing. Or why is an child process dying causing this exception and how can I prevent this
Here's the module (stripped down)
package consumer;
use AnyEvent::ForkManager;
use AnyEvent::STOMP::Client;
use JSON;
use Data::Dumper;
use v5.18;
use Moose;
sub run {
my $self = shift;
my $pm = AnyEvent::ForkManager->new(max_workers => 20);
my $stomp = AnyEvent::STOMP::Client->new();
$stomp->connect();
$stomp->on_connected(sub {
my $stomp = shift;
$stomp->subscribe('/topic/test');
say "Connected to STOMP";
});
$pm->on_start(sub {
my ($pm,$pid,#params) = #_;
say "Starting $pid worker";
});
$pm->on_finish(sub {
my ($pm, $pid,#params) = #_;
say "Finished $pid worker";
});
$pm->on_error(sub {
say Dumper(\#_);
});
$stomp->on_message(sub {
my ($stomp, $header, $body) = #_;
my $href = decode_json $body;
$pm->start(cb => sub {
my ($pm, #params) = #_;
$self->process(#params);
},
args => [ $href->{id}, $href->{data}->{type}, $href->{data}->{who} ],
);
});
my $cv = AnyEvent->condvar;
$cv->recv;
}
sub process {
say "Processing ".Dumper(\#_);
sleep 5;
if ( int(rand(10)) < 5 ) {
die "OOPS"; # this triggers the error message above
}
say "Done Processing $_[1]";
}
1;
Heres the driver for the module above:
#!/usr/bin/env perl
use v5.18;
use lib '.';
use consumer;
my $c = consumer->new();
$c->run;
Finally a traffic generator that you can use to see this in action:
#!/usr/bin/env perl
use lib '../lib';
use lib '../../lib';
use v5.18;
use Data::Dumper;
use JSON;
use Net::STOMP::Client;
$ENV{'scot_mode'} = "testing";
my $stomp = Net::STOMP::Client->new(
host => "127.0.0.1",
port => 61613
);
$stomp->connect();
for (my $i = 1; $i < 1000000; $i++) {
my $href = {
id => $i,
type => "event",
what => "foo",
};
my $json = encode_json $href;
say "Sending ".Dumper($href);
$stomp->send(
destination => "/topic/test",
body => $json,
);
}
$stomp->disconnect();
I was able to solve this by using Try::Catch and wrapping the call to self->process with a try catch like this:
$stomp->on_message(sub {
my ($stomp, $header, $body) = #_;
my $href = decode_json $body;
$pm->start(cb => sub {
my ($pm, #params) = #_;
try {
$self->process(#params);
}
catch {
# error handling stuff
};
},
args => [ $href->{id}, $href->{data}->{type}, $href->{data}->{who} ],
);
}
);

Understanding async in perl on specific example

I have to write a script that get some URLs in parallel and do some work. In the past I have always used Parallel::ForkManager for such things, but now I wanted to learn something new and try asynchronous programming with AnyEvent (and AnyEvent::HTTP or AnyEvent::Curl::Multi) ... but I'm having problem understanding AnyEvent and writing a script that should:
open a file (every line is a seperate URL)
(from now in parallel, but with a limit for f.e. 10 concurrent requests)
read file line after line (I dont want to load whole file to memory - it might be big)
make a HTTP request for that URL
read response
updates MySQL record accordingly
(next file line)
I have read many manuals, tutorials, but its still hard for me to understand differences between blocking and non-blocking code. I have found similar script at http://perlmaven.com/fetching-several-web-pages-in-parallel-using-anyevent, where Mr. Szabo explains the basics, but I still cant understand how to implement something like:
...
open my $fh, "<", $file;
while ( my $line = <$fh> )
{
# http request, read response, update MySQL
}
close $fh
...
... and add a concurrency limit in this case.
I would be very grateful for help ;)
UPDATE
Following Ikegami's advice I gave Net::Curl::Multi a try. I'm very pleased with results. After years of using Parallel::ForkManager just for concurrent grabbing thousands of URLs, Net::Curl::Multi seems to be awesome.
Here is my code with while loop on filehandle. It seems to work as it should, but considering it's my first time writing something like this I would like to ask more experienced Perl users to take a look and tell me if there are some potential bugs, something I missed, etc.
Also, if I may ask: as I don't fully understand how Net::Curl::Multi's concurrency works, please tell me whether I should expect any problems with putting MySQL UPDATE command (via DBI) inside RESPONSE loop (besides higher server load obviously - I expect final script to run with about 50 concurrent N::C::M workers, maybe more).
#!/usr/bin/perl
use Net::Curl::Easy qw( :constants );
use Net::Curl::Multi qw( );
sub make_request {
my ( $url ) = #_;
my $easy = Net::Curl::Easy->new();
$easy->{url} = $url;
$easy->setopt( CURLOPT_URL, $url );
$easy->setopt( CURLOPT_HEADERDATA, \$easy->{head} );
$easy->setopt( CURLOPT_FILE, \$easy->{body} );
return $easy;
}
my $maxWorkers = 10;
my $multi = Net::Curl::Multi->new();
my $workers = 0;
my $i = 1;
open my $fh, "<", "urls.txt";
LINE: while ( my $url = <$fh> )
{
chomp( $url );
$url .= "?$i";
print "($i) $url\n";
my $easy = make_request( $url );
$multi->add_handle( $easy );
$workers++;
my $running = 0;
do {
my ($r, $w, $e) = $multi->fdset();
my $timeout = $multi->timeout();
select $r, $w, $e, $timeout / 1000
if $timeout > 0;
$running = $multi->perform();
RESPONSE: while ( my ( $msg, $easy, $result ) = $multi->info_read() ) {
$multi->remove_handle( $easy );
$workers--;
printf( "%s getting %s\n", $easy->getinfo( CURLINFO_RESPONSE_CODE ), $easy->{url} );
}
# dont max CPU while waiting
select( undef, undef, undef, 0.01 );
} while ( $workers == $maxWorkers || ( eof && $running ) );
$i++;
}
close $fh;
Net::Curl is a rather good library that's extremely fast. Furthermore, it can handle parallel requests too! I'd recommend using this instead of AnyEvent.
use Net::Curl::Easy qw( :constants );
use Net::Curl::Multi qw( );
sub make_request {
my ( $url ) = #_;
my $easy = Net::Curl::Easy->new();
$easy->{url} = $url;
$easy->setopt( CURLOPT_URL, $url );
$easy->setopt( CURLOPT_HEADERDATA, \$easy->{head} );
$easy->setopt( CURLOPT_FILE, \$easy->{body} );
return $easy;
}
my $max_running = 10;
my #urls = ( 'http://www.google.com/' );
my $multi = Net::Curl::Multi->new();
my $running = 0;
while (1) {
while ( #urls && $running < $max_running ) {
my $easy = make_request( shift( #urls ) );
$multi->add_handle( $easy );
++$running;
}
last if !$running;
my ( $r, $w, $e ) = $multi->fdset();
my $timeout = $multi->timeout();
select( $r, $w, $e, $timeout / 1000 )
if $timeout > 0;
$running = $multi->perform();
while ( my ( $msg, $easy, $result ) = $multi->info_read() ) {
$multi->remove_handle( $easy );
printf( "%s getting %s\n", $easy->getinfo( CURLINFO_RESPONSE_CODE ), $easy->{url} );
}
}
This does exactly what you want, in an asynchronous fashion, and it does that by wrapping Net::Curl in a safe fashion:
#!/usr/bin/env perl
package MyDownloader;
use strict;
use warnings qw(all);
use Moo;
extends 'YADA::Worker';
has '+use_stats'=> (default => sub { 1 });
has '+retry' => (default => sub { 10 });
after init => sub {
my ($self) = #_;
$self->setopt(
encoding => '',
verbose => 1,
);
};
after finish => sub {
my ($self, $result) = #_;
if ($self->has_error) {
print "ERROR: $result\n";
} else {
# do the interesting stuff here
printf "Finished downloading %s: %d bytes\n", $self->final_url, length ${$self->data};
}
};
around has_error => sub {
my $orig = shift;
my $self = shift;
return 1 if $self->$orig(#_);
return 1 if $self->getinfo('response_code') =~ m{^5[0-9]{2}$}x;
};
1;
package main;
use strict;
use warnings qw(all);
use Carp;
use YADA;
my $q = YADA->new(
max => 8,
timeout => 30,
);
open(my $fh, '<', 'file_with_urls_per_line.txt')
or croak "can't open queue: $!";
while (my $url = <$fh>) {
chomp $url;
$q->append(sub {
MyDownloader->new($url)
});
}
close $fh;
$q->wait;

Measure individual time taken using perl AnyEvent

I have a requirement to fetch many http urls and I use AnyEvent::HTTP to do this
For every URL I need to measure the time taken how can I do this ?
My code (stripped down) is here
#!/usr/bin/perl
use strict;
use AnyEvent::HTTP;
use AnyEvent::Socket;
use Data::Dumper;
my $internal_ip=v192.168.2.103; #Use this ip to bind instead of default ip. Harcoding necessary :-( using v$ip
sub prep_cb {
my ($socket)=#_;
my $bind = AnyEvent::Socket::pack_sockaddr undef, $internal_ip;
# I need to start the time here
bind $socket, $bind
or die "bind: $!";
}
my $url="http://192.168.2.105/echo.php";
my $anyevent = AnyEvent->condvar;
$anyevent->begin;
http_request(
"GET" => $url,
on_prepare =>\&prep_cb,
sub {
my ($data, $hdr) = #_;
$anyevent->end;
# I need to measure the time taken
print Dumper([$data,$hdr]);
}
);
$anyevent->recv;
What if you replace your http_request() with the following:
my $timer;
http_request(
"GET" => $url,
on_prepare => sub {$timer = time; prep_cb},
sub {
my ($data, $hdr) = #_;
$anyevent->end;
print "Took " . (time - $timer) . " seconds.\n";
print Dumper([$data,$hdr]);
}
);
Simpler way is to have a variable and update it on on_prepare and log it after $anyevent->end as mentioned by TheAmigo
A general way to profile/time any function:
Assuming your function is fetchHttpUrl($url),
you could call it like this
profile(\&fetchHttpUrl, $url);
sub profile {
my($function, #arguments) = #_;
my $startTime = currentTimeInMilliseconds();
$function->(#arguments);
my $durationInMs = currentTimeInMilliseconds() - $startTime;
print"{".getMethodNameFromPointer($function)."(".join(",", #arguments).")"."} : $durationInMs ms";
}

Using HTTP::Server::Simple::CGI, how do I get the headers?

Basically, my question is similar to
How do I access HTTP request headers in HTTP::Server::Simple::CGI?
The answer was to use parse_headers(), but there was no example how to use it properly. I tried to use parse_headers() but I'm not getting any result, it just stops at parse_headers() like the program is stucked. I couldn't add a comment on the question above since I don't have enough rep to do so, so I created this new question.
Below is my sample code, basically the example code from CPAN just added the parse_headers:
#!/usr/bin/perl
{
package MyWebServer;
use HTTP::Server::Simple::CGI;
our #ISA = qw(HTTP::Server::Simple::CGI);
use Data::Dumper;
my %dispatch = (
'/hello.cgi' => \&resp_hello,
# ...
);
sub handle_request {
my $self = shift;
my $cgi = shift;
my $path = $cgi->path_info();
my $handler = $dispatch{$path};
my $header = $self->parse_headers();
open F,qq{>>~/MyWebServer.log};
my $dump = Data::Dumper->Dump([$header], [qw($header)]);
print F $dump;
close F;
if (ref($handler) eq "CODE") {
print "HTTP/1.0 200 OK\r\n";
$handler->($cgi);
} else {
print "HTTP/1.0 404 Not found\r\n";
print $cgi->header,
$cgi->start_html('Not found'),
$cgi->h1('Not found'),
$cgi->end_html;
}
}
sub resp_hello {
my $cgi = shift; # CGI.pm object
return if !ref $cgi;
my $who = $cgi->param('name');
print $cgi->header,
$cgi->start_html("Hello"),
$cgi->h1("Hello $who!"),
$cgi->end_html;
}
} # end of package MyWebServer
# start the server on port 8080
my $pid = MyWebServer->new(8080)->background();
print "Use 'kill $pid' to stop server.\n";
Only added this part:
my $header = $self->parse_headers();
open F,qq{>>~/MyWebServer.log};
my $dump = Data::Dumper->Dump([$header], [qw($header)]);
print F $dump;
close F;
My objective is to get all the headers and dump it into a file.
Add
sub headers {
my( $self, $headers ) = #_;
if( $headers ){
$self->{__last_headers} = { #$headers };
}
return $self->{__last_headers};
}
Then inside handle_request use my $header = $self->headers();
FWIW, i'm curious why you're using HTTP::Server::Simple::CGI instead of Mojolicious or Dancer or even HTTP::Server::Simple::PSGI. https://metacpan.org/pod/PSGI is portability.