Using HTTP::Server::Simple::CGI, how do I get the headers? - perl

Basically, my question is similar to
How do I access HTTP request headers in HTTP::Server::Simple::CGI?
The answer was to use parse_headers(), but there was no example how to use it properly. I tried to use parse_headers() but I'm not getting any result, it just stops at parse_headers() like the program is stucked. I couldn't add a comment on the question above since I don't have enough rep to do so, so I created this new question.
Below is my sample code, basically the example code from CPAN just added the parse_headers:
#!/usr/bin/perl
{
package MyWebServer;
use HTTP::Server::Simple::CGI;
our #ISA = qw(HTTP::Server::Simple::CGI);
use Data::Dumper;
my %dispatch = (
'/hello.cgi' => \&resp_hello,
# ...
);
sub handle_request {
my $self = shift;
my $cgi = shift;
my $path = $cgi->path_info();
my $handler = $dispatch{$path};
my $header = $self->parse_headers();
open F,qq{>>~/MyWebServer.log};
my $dump = Data::Dumper->Dump([$header], [qw($header)]);
print F $dump;
close F;
if (ref($handler) eq "CODE") {
print "HTTP/1.0 200 OK\r\n";
$handler->($cgi);
} else {
print "HTTP/1.0 404 Not found\r\n";
print $cgi->header,
$cgi->start_html('Not found'),
$cgi->h1('Not found'),
$cgi->end_html;
}
}
sub resp_hello {
my $cgi = shift; # CGI.pm object
return if !ref $cgi;
my $who = $cgi->param('name');
print $cgi->header,
$cgi->start_html("Hello"),
$cgi->h1("Hello $who!"),
$cgi->end_html;
}
} # end of package MyWebServer
# start the server on port 8080
my $pid = MyWebServer->new(8080)->background();
print "Use 'kill $pid' to stop server.\n";
Only added this part:
my $header = $self->parse_headers();
open F,qq{>>~/MyWebServer.log};
my $dump = Data::Dumper->Dump([$header], [qw($header)]);
print F $dump;
close F;
My objective is to get all the headers and dump it into a file.

Add
sub headers {
my( $self, $headers ) = #_;
if( $headers ){
$self->{__last_headers} = { #$headers };
}
return $self->{__last_headers};
}
Then inside handle_request use my $header = $self->headers();
FWIW, i'm curious why you're using HTTP::Server::Simple::CGI instead of Mojolicious or Dancer or even HTTP::Server::Simple::PSGI. https://metacpan.org/pod/PSGI is portability.

Related

Mojo::UserAgent non-blocking vs blocking performance

I have the following code:
my $ua = Mojo::UserAgent->new ();
my #ids = qw(id1 id2 id3);
foreach (#ids) {
my $input = $_;
my $res = $ua->get('http://my_site/rest/id/'.$input.'.json' => sub {
my ($ua, $res) = #_;
print "$input =>" . $res->result->json('/net/id/desc'), "\n";
});
}
Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
Why when I run the above code (non-blocking) does it take about 6 seconds while when running the code as blocking, i.e. inside the loop something like:
my $res = $ua->get('http://my_site/rest/id/'.$input.'.json');
print "$input =>" . $res->result->json('/net/id/desc'), "\n";
without the latest line it takes about 1 second?
Why is the blocking code faster than the non-blocking code?
The first thing to check when things happened. I couldn't get the same delay. Remember to try each way several times to spot outliers where there's a network hiccup. Note that the second argument to the non-blocking sub is a transaction object, normally written as $tx, where the response object is normally written res:
use Mojo::Util qw(steady_time);
say "Begin: " . steady_time();
END { say "End: " . steady_time() }
my $ua = Mojo::UserAgent->new ();
my #ids = qw(id1 id2 id3);
foreach (#ids) {
my $input = $_;
my $res = $ua->get(
$url =>
sub {
my ($ua, $tx) = #_;
print "Fetched\n";
}
);
}
One possibility is that keep-alive is holding an open connection. What happens if you turn that off?
my $res = $ua->get(
$url =>
{ Connection => 'close' }
sub {
my ($ua, $tx) = #_;
print "Fetched\n";
}
);
Here's a version that uses promises, which you'll want to get used to as more Mojo stuff moves to it:
use feature qw(signatures);
no warnings qw(experimental::signatures);
use Mojo::Promise;
use Mojo::Util qw(steady_time);
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
say "Begin: " . steady_time();
END { say "End: " . steady_time() }
my #ids = qw(id1 id2 id3);
my #gets = map {
$ua->get_p( 'http://www.perl.com' )->then(
sub ( $tx ) { say "Fetched: " . steady_time() },
sub { print "Error: #_" }
);
} #ids;
Mojo::Promise->all( #gets )->wait;

Sequentioal requests with Mojo::IOLoop::Delay

I need to get list of URLs in non-blocking mode, but not in parallel. It should be sequential requests one by one. How can I realize that?
I cannot find examples. Documentation and articles highlight only parallel execution.
Now my code looks like the following (simplified):
my $delay = Mojo::IOLoop::Delay->new;
$delay->steps(
sub {
build_report();
say "done";
}
);
sub parse_data {
...;
my $url = shift #urls;
my $end = $delay->begin;
$ua->get( $url => \&parse_data );
$end->();
}
my $end = $delay->begin;
$ua->get( $url => \&parse_data );
$end->();
$delay->wait;
I want to avoid multiple closures by using Mojo::IOLoop::Delay.
I have just started looking at Mojo::IOLoop myself for a project and I'm not au fait with it yet. But I think the simplest way is to build an array of closures and pass it to $delay->steps.
use strict;
use warnings 'all';
use feature 'say';
use Mojo;
use constant URL => 'http://stackoverflow.com/questions/tagged/perl';
STDOUT->autoflush;
my $ua = Mojo::UserAgent->new;
my $delay = Mojo::IOLoop::Delay->new;
my #urls = ( URL ) x 10;
my #steps = map {
my $url = $_;
sub {
my $end = $delay->begin;
$ua->get( $url => sub {
my ( $ua, $txn ) = #_;
$end->();
if ( my $err = $txn->error ) {
say $err->message;
}
else {
my $res = $txn->success;
say "#{$res}{qw/ code message /}";
}
});
}
} #urls;
$delay->steps( #steps, sub { say 'Done' } );
Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
output
200 OK
200 OK
200 OK
200 OK
200 OK
200 OK
200 OK
200 OK
200 OK
Done
200 OK
Note that Done is printed before the status line of the final get, because I have called $end->() immediately the callback arrives, assuming that any handling of the response doesn't need to be synchronised
If you don't want that then just move $end->() to the end of the callback. Then the delay will wait until the output has been generated before sending another request
Sequential requests very easy to implement.
Web server
#!/usr/bin/perl
use Mojo::Base -strict;
use Mojolicious::Lite;
get '/' => sub {
my $c = shift->render_later;
$c->delay(sub {
my $delay = shift;
$c->ua->get('https://google.ru' => $delay->begin);
}, sub {
my ($delay, $tx) = #_;
$tx->res->body; # body of the google.ru
$c->ua->get('https://twitter.com' => $delay->begin);
}, sub {
my ($delay, $tx) = #_;
$tx->res->body; # body of the twitter.com
$c->render(text => 'Done');
});
};
app->start;
Script
#!/usr/bin/perl
use Mojo::Base -strict;
use Mojo::IOLoop;
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
Mojo::IOLoop->delay(sub {
my $delay = shift;
$ua->get('https://www.google.ru' => $delay->begin);
}, sub {
my ($delay, $tx) = #_;
$tx->res->body; # body of the google.ru
$ua->get('https://twitter.com' => $delay->begin);
}, sub {
my ($delay, $tx) = #_;
$tx->res->body; # body of the twitter.com
warn 'DONE';
})->wait;
Script (dynamic requests)
Example here

Getting the body of an http POST request, using mod-perl 2

I am writing a quick script to munge a submitted file, and return that content to the user.
My test code looks like this:
#!/path/to/bin/perl
use strict;
use warnings;
use utf8;
use Apache2::RequestRec;
use Apache2::RequestIO;
my ( $xmlin, $accepts ) = (q{}, q{});
my $format = 'json';
# read the posted content
while (
Apache2::RequestIO::read($xmlin, 1024)
) {};
{
no warnings;
$accepts = $Apache2::RequestRec::headers_in{'Accepts'};
}
if ($accepts) {
for ($accepts) {
/application\/xml/i && do {
$format = 'xml';
last;
};
/text\/plain/i && do {
$format = 'text';
last;
};
} ## end for ($accepts)
} ## end if ($accepts)
print "format: $format; xml: $xmlin\n";
This code fails to compile with Undefined subroutine &Apache2::RequestIO::read
If I comment out the while loop, the code runs fine.
Unfortunately the Apache2::RequestIO code is pulled in via Apache2::XSLoader::load __PACKAGE__; so I can't check the actual code.... but I don't understand why this doesn't work
(and yes, I've also tried $r->read(...), to no avail)
I think I have a good idea of why your code is not working.
The module Apache2::RequestIO added new functionality to Apache2::RequestRec.
In other words to add new methods/functions to the Apache2::RequestRec namespace.
I would first change Apache2::RequestIO::read to Apache2::RequestRec::read.
If that does not work move use a handler.
I have code that works which does a similar the thing
In your httpd.conf
PerlSwitches -I/path/to/module_dir
PerlLoadModule ModuleName
PerlResponseHandler ModuleName
ModuleName.pm
package ModuleName;
use strict;
use warnings;
use Apache2::RequestIO();
use Apache2::RequestRec();
use Apache2::Const -compile => qw(OK);
sub handler {
my ($r) = #_;
{
use bytes;
my $content = '';
my $offset = 0;
my $cnt = 0;
do {
$cnt = $r->read($content,8192,$offset);
$offset += $cnt;
} while($cnt == 8192);
}
return Apache2::Const::HTTP_OK;
}
I also use Apache2::RequestIO to read the body:
sub body {
my $self = shift;
return $self->{ body } if defined $self->{ body };
$self->apr->read( $self->{ body }, $self->headers_in->get( 'Content-Length' ) );
$self->{ body };
}
In this case you should subclass original Apache2::Request. Especially pay attention to our #ISA = qw(Apache2::Request);
I do not know why, but standard body method return me:
$self->body # {}
$self->body_status # Missing parser
when Content-Type is application/json. So I work around that in such way. Then parse body myself:
sub content {
my $self = shift;
return $self->{ content } if defined $self->{ content };
my $content_type = $self->headers_in->get('Content-Type');
$content_type =~ s/^(.*?);.*$/$1/;
return unless exists $self->{ $content_type };
return $self->{ content } = $self->{ $content_type }( $self->body, $self );
}
where:
use JSON;
sub new {
my ($proto, $r) = #_;
my $self = $proto->SUPER::new($r);
$self->{ 'application/json' } = sub {
decode_json shift;
};
return $self;
}

Where can I find the request body in HTTP::Server::Simple

I have the following simple server:
And I am trying to locate where the request body (or content) is.
I have tried dumping $self and $cgi but they didn't contain the field (I am asuming because they don't carry any information regarding the request)
How can I get the request body ?
package MyWebServer;
use strict;
use HTTP::Server::Simple::CGI;
use base qw(HTTP::Server::Simple::CGI);
use Data::Dumper;
my %dispatch = (
'/hello' => \&resp_hello,
# ...
);
sub handle_request {
my $self = shift;
my $cgi = shift;
my $path = $cgi->path_info();
my $handler = $dispatch{$path};
print "printing self in request".Dumper($cgi);
my $req = $cgi->get_request;
if (ref($handler) eq "CODE") {
print "HTTP/1.0 200 OK\r\n";
$handler->($cgi, "asd");
} else {
print "HTTP/1.0 404 Not found\r\n";
print $cgi->header,
$cgi->start_html('Not found'),
$cgi->h1('Not found'),
$cgi->end_html;
}
}
sub resp_hello($$) {
my ($cgi, $asd) = #_; # CGI.pm object
my $who = $cgi->param('name');
print $cgi->header,
$cgi->start_html("Hello"),
$cgi->h1("Hello world!!"),
$cgi->h2("Azdh $asd");
$cgi->end_html;
}
# start the server on port 8080
my $pid = MyWebServer->new(8081)->background();
print "Use 'kill $pid' to stop server.\n";
EDIT: Here is an example request:
use strict;
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(GET => "http://localhost:8081/hello");
$req->content("<foo>3.14</foo>"); # the request body
my $resp = $ua->request($req);
if ($resp->is_success) {
my $message = $resp->decoded_content;
print "Received reply: $message\n";
}
else {
print "HTTP GET error code: ", $resp->code, "\n";
print "HTTP GET error message: ", $resp->message, "\n";
}
It's a bit old, but facing the same issue, here's the solution :
$cgi->param('POSTDATA');
That's all you need to retreive the Body contents.
cheers.
The request object you obtained using the line $req = $cgi->get_request is a CGI::Request object. Since this is a request object, it will have only attributes (parameters passed on to the request). Please note that only response objects will have content. So, to see all the parameters you have passed, you can use the as_string() object method as mentioned below.
print $req->as_string;
For more information about accessing individual parameters of the request object, please see CGI::Request documentation in http://search.cpan.org/~mrjc/cvswebedit-v2.0b1/cvs-web/lib/CGI/Request.pm.

Download Specific Images

I'm trying to search and download specific images /front and back cover / of a website if found but whatever I do I always download only one of them. What should I change in my code to download both of them if found?
use strict;
use warnings;
use LWP;
use LWP::UserAgent;
use URI::Escape;
use HTTP::Status;
getCover(...);
sub getCover {
......
while ($title_found =~ /'(http:\/\/images.blu-ray.com\/movies\/covers\/\d+_.*?)'/gis) {
$url = getSite($1);
if ($title_found =~ /front/) {
$filename = 'front.jpg';
}
elsif ($title_found =~ /back/) {
$filename = 'back.jpg';
}
}
my $dir = 'somepath'.$filename;
open F, ">", $dir;
binmode F;
print F $url;
close F;
return 0;
}
sub getSite {
$url = shift;
print "URL: $url\n";
my $r;
my $ua = new LWP::UserAgent();
$ua->agent("Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312");
my $req = new HTTP::Request GET => $url;
$req->push_header("Accept-Language", "en");
$req = $ua->prepare_request($req);
my $res = $ua->request($req);
my $rc = $res->code;
if(is_success($rc)){
$r = $res->as_string();
$r = $res->content();
}
else {
print "Failed\n";
}
return $r;
}
Try putting the part that saves to 'somepath'.$filename inside the while loop instead of outside it.
Also, it appears that $title_found is supposed to contain multiple URLs. In that case, you need to save $1 to a temporary variable, and look for front/back in that instead of in $title_found. Otherwise, you'll wind up saving both covers to front.jpg.