How can I get the entire request body with CGI.pm? - perl

I'm trying to write a Perl CGI script to handle XML-RPC requests, in which an XML document is sent as the body of an HTTP POST request.
The CGI.pm module does a great job at extracting named params from an HTTP request, but I can't figure out how to make it give me the entire HTTP request body (i.e. the XML document in the XML-RPC request I'm handling).
If not CGI.pm, is there another module that would be able to parse this information out of the request? I'd prefer not to have to extract this information "by hand" from the environment variables. Thanks for any help.

You can get the raw POST data by using the special parameter name POSTDATA.
my $q = CGI->new;
my $xml = $q->param( 'POSTDATA' );
Alternatively, you could read STDIN directly instead of using CGI.pm, but then you lose all the other useful stuff that CGI.pm does.
The POSTDATA trick is documented in the excellent CGI.pm docs here.

Right, one could use POSTDATA, but that only works if the request Content-Type has not been set to 'multipart/form-data'.
If it is set to 'multipart/form-data', CGI.pm does its own content processing and POSTDATA is not initialized.
So, other options include $cgi->query_string and/or $cgi->Dump.
The $cgi->query_string returns the contents of the POST in a GET format (param=value&...), and there doesn't seem to be a way to simply get the contents of the POST STDIN as they were passed in by the client.
So to get the actual content of the standard input of a POST request, if modifying CGI.pm is an option for you, you could modify around line 620 to save the content of #lines somewhere in a variable, such as:
$self->{standard_input} = join '', #lines;
And then access it through $cgi->{standard_input}.

To handle all cases, including those when Content-Type is multipart/form-data, read (and put back) the raw data, before CGI does.
use strict;
use warnings;
use IO::Handle;
use IO::Scalar;
STDIN->blocking(1); # ensure to read everything
my $cgi_raw = '';
{
local $/;
$cgi_raw = <STDIN>;
my $s;
tie *STDIN, 'IO::Scalar', \$s;
print STDIN $cgi_raw;
tied(*STDIN)->setpos(0);
}
use CGI qw /:standard/;
...

Related

Get raw request body using perl CGI.pm

I am building a web server using Apache and Perl CGI which processes the POST requests sent to it. The processing part requires me to get the completely unprocessed data from the request and verify its signature.
The client sends two different kinds of POST requests: one with the content-type set as application/json, and the second one with content type as application/x-www-form-urlencoded.
I was able to fetch the application/json data using cgi->param('POSTDATA'). But if I do the same for application/x-www-form-urlencoded data, i.e. cgi->param('payload'), I get the data but it's already decoded. I want the data in its original URL-encoded format. i.e I want the unprocessed data as it is sent out by the client.
I am doing this for verifying requests sent out by Slack.
To handle all cases, including those when Content-Type is multipart/form-data, read (and put back) the raw data, before CGI does.
use strict;
use warnings;
use IO::Handle;
use IO::Scalar;
STDIN->blocking(1); # ensure to read everything
my $cgi_raw = '';
{
local $/;
$cgi_raw = <STDIN>;
my $s;
tie *STDIN, 'IO::Scalar', \$s;
print STDIN $cgi_raw;
tied(*STDIN)->setpos(0);
}
use CGI qw /:standard/;
...
Though I'm not sure which Perl Module can handle it all for you, but here is a basic rundown.
Your HTML form should submit to a .cgi file (or any other handler which is properly defined).
The raw request is something similar to this:
POST HTTP/1.1
UserAgent: Mozilla/5.0
Content-Length: 69
Host: 127.0.0.1
(More headers depending on situation and then a single blank line)
(Message Body Containing data)
username=John&password=123J (example)
https://en.wikipedia.org/wiki/List_of_HTTP_header_fields)
What will happen is that, this data is available via the CGI (not CGI perl module aka CGI.pm) using the environment variables and stdin (header feilds are passed using EV and message body using stdin).
In Perl, I think you need this to read those EVs: http://perldoc.perl.org/Env.html
And this to read stdin: https://perlmaven.com/read-from-stdin
From there on, you can process as needed.
BE CAREFULL, when reading any of these. You can be sent malformed information like 100GB valid data in one of the HTTP headers, or in message body, which can break havoc on you or dangerous system calls, etc. Sterilizing is necessary, before passing the data to other places.

How to get URLencoded data from the body of a POST in CGI Perl

POSTDATA is not the correct answer. I have read the docs and still don't see how I can get the data.
I want to receive this request:
POST /cgi-bin/myscript.cgi HTTP/1.1
Host: myhost.com
Content-Length: 3
Content-Type: application/x-www-form-urlencoded
255
and have the server respond
You sent the string "255"
Please assist, I am a Perl beginner and have gotten a bunch of seemingly wrong and useless answers to this seemingly simple request.
CGI will automatically parse form data, so you need to hide that what you got is form data (or at least claims to be).
use CGI qw( );
$ENV{CONTENT_TYPE} = 'application/octet-stream';
my $cgi = CGI->new();
my $post_data = $cgi->param('POSTDATA');
Better solution: Have the requester use a correct content type (e.g. application/octet-stream), or have the requester actually send form data (e.g. data=255).
Unique solution for me, was change of ContentType on client's petition to 'application/octet-stream'
Module CGI CPAN says:
If POSTed data is not of type application/x-www-form-urlencoded or
multipart/form-data, then the POSTed data will not be processed, but
instead be returned as-is in a parameter named POSTDATA.
So if you can't change on clients petition to other ContentType, it won't be processed.
CGI (in recent versions at least) will stuff incorrectly encoded x-www-form-urlencoded params into a parameter named keywords. Better to send a proper content type though, then the POSTDATA works exactly as the docs say:
If POSTed data is not of type application/x-www-form-urlencoded or
multipart/form-data, then the POSTed data will not be processed...
use strictures;
use CGI::Emulate::PSGI;
use Plack::Test;
use HTTP::Request::Common;
use Test::More;
my $post = POST "/non-e-importa",
"Content-Length" => 5,
"Content-Type" => "application/x-www-form-urlencoded",
Content => "ohai\n";
my $cgis = CGI::Emulate::PSGI->handler( sub {
use CGI "param", "header";
my $incorrectly_encoded_body = param("keywords");
print header("text/plain"), $incorrectly_encoded_body;
});
test_psgi $cgis, sub {
my $cb = shift;
my $res = $cb->($post);
is $res->content, "ohai", "Soopersek437 param: keywords";
};
done_testing();
__END__
prove so-16846138 -v
ok 1 - Soopersek437 param: keywords
1..1
ok
All tests successful.
Result: PASS

CGI Perl Echo back POSTed application/x-www-form-urlencoded Content

I need a simple CGI based Perl script to receive a POST (directly, not from another HTML page) with Content-Type being application/x-www-form-urlencoded and to echo back
I received: (encoded string)
(and if possible)
decoded, the string is: (decoded string)
I am new to CGI Perl, and this is a one off request for testing a product (I'm a sysadmin. not a programmer). I intend to learn Perl more deeply in the future, but in this case I'm hoping for a gimme.
To start off, I will quickly skim some of the basics.
Following is the package for PERL/CGI application:
use CGI;
To create CGI object:
my $web = CGI->new;
Make sure you set and then write HTTP headers to outstream, before flushing out any CGI data to outstream. Otherwise you would end up in 500 error.
To set the headers:
print $web->header();
print $web->header('application/x-www-form-urlencoded');
To receive any post data from HTML, say for example,
http://example.com?POSTDATA=helloworld
you may use param() function:
my $data = $web->param('POSTDATA');
scalar $data would be set with "helloworld".
It is advisable to check if $web->param('POSTDATA') is defined before you assign to a scalar.

What is the preferred method of accessing WWW::Mechanize responses?

Are both of these versions OK or is one of them to prefer?
#!/usr/bin/env perl
use strict;
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
my $content;
# 1
$mech->get( 'http://www.kernel.org' );
$content = $mech->content;
print $content;
# 2
my $res = $mech->get( 'http://www.kernel.org' );
$content = $res->content;
print $content;
They are both acceptable. The second one seems cleaner to me because it returns a proper HTTP::Response object which you can query and call methods on, and also means that if you make another Mechanize request, you'll still have access to the old HTTP response. With your first approach, each time you make a request, the content method will change to something new, which sounds error-prone.
Btw, for either method, you should check $response->is_success or $mech->success before accessing the content, as the request may have failed.
The content() method is sometimes more convenient:
$mech->content(...)
Returns the content that the mech uses internally for the last page fetched. Ordinarily this is the same as $mech->response()->content(), but this may differ for HTML documents if "update_html" is overloaded, and/or extra named arguments are passed to content():
$mech->content( format => 'text' )
Returns a text-only version of the page, with all HTML markup stripped. This feature requires HTML::TreeBuilder to be installed, or a fatal error will be thrown.
$mech->content( base_href => [$base_href|undef] )
Returns the HTML document, modified to contain a mark-up in the header. $base_href is $mech->base() if not specified. This is handy to pass the HTML to e.g. HTML::Display.
$mech->content is specifically there so you can bypass having to get the result response. The simpler it is, the better.

How can I detect the file type of image at a URL?

How to find the image file type in Perl form website URL?
For example,
$image_name = "logo";
$image_path = "http://stackoverflow.com/content/img/so/".$image_name
From this information how to find the file type that . here the example it should display
"png"
http://stackoverflow.com/content/img/so/logo.png .
Supposer if it has more files like SO web site . it should show all file types
If you're using LWP to fetch the image, you can look at the content-type header returned by the HTTP server.
Both WWW::Mechanize and LWP::UserAgent will give you an HTTP::Response object for any GET request. So you can do something like:
use strict;
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new;
$mech->get( "http://stackoverflow.com/content/img/so/logo.png" );
my $type = $mech->response->headers->header( 'Content-Type' );
You can't easily tell. The URL doesn't necessarily reflect the type of the image.
To get the image type you have to make a request via HTTP (GET, or more efficiently, HEAD), and inspect the Content-type header in the HTTP response.
Well, https://stackoverflow.com/content/img/so/logo is a 404. If it were not, then you could use
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my ($content_type) = head "https://stackoverflow.com/content/img/so/logo.png";
print "$content_type\n" if defined $content_type;
__END__
As Kent Fredric points out, what the web server tells you about content type need not match the actual content sent by the web server. Keep in mind that File::MMagic can also be fooled.
#!/usr/bin/perl
use strict;
use warnings;
use File::MMagic;
use LWP::UserAgent;
my $mm = File::MMagic->new;
my $ua = LWP::UserAgent->new(
max_size => 1_000 * 1_024,
);
my $res = $ua->get('https://stackoverflow.com/content/img/so/logo.png');
if ( $res->code eq '200' ) {
print $mm->checktype_contents( $res->content );
}
else {
print $res->status_line, "\n";
}
__END__
You really can't make assumptions about content based on URL, or even content type headers.
They're only guides to what is being sent.
A handy trick to confuse things that use suffix matching to identify file-types is doing this:
http://example.com/someurl?q=foo#fakeheheh.png
And if you were to arbitrarily permit that image to be added to the page, it might in some cases be a doorway to an attack of some sorts if the browser followed it. ( For example, http://really_awful_bank.example.com/transfer?amt=1000000;from=123;to=123 )
Content-type based forgery is not so detrimental, but you can do nasty things if the person who controls the name works out how you identify things and sends different content types for HEAD requests as it does for GET requests.
It could tell the HEAD request that it's an Image, but then tell the GET request that its a application/javascript and goodness knows where that will lead.
The only way to know for certain what it is is downloading the file and then doing MAGIC based identification, or more (i.e., try to decode the image). Then all you have to worry about is images that are too large, and specially crafted images that could trip vulnerabilities in computers that are not yet patched for that vulnerability.
Granted all of the above is extreme paranoia, but if you know the rare possibilities you can make sure they can't happen :)
From what i understand you're not worried about the content type of an image you already know the the name+extension for, you want to find the extension for an image you know the base name of.
In order to do that you'd have to test all the image extensions you wanted individually and store which ones resolved and which ones didn't. For example both https://stackoverflow.com/content/img/so/logo.png and https://stackoverflow.com/content/img/so/logo.gif could exist. They don't in this exact situation but on some arbitrary server you could have multiple images with the same base name but different extensions. Unfortunately there's no way to get a list of available extensions of a file in a remote web directory by supplying its base name without looping through the possibilities.