Perl CGI with HTTP Status Codes - perl

I have the following validation in a CGI script that will check for the GET method and return a 405 HTTP status code if the GET method is not used. Unfortunately it is still returning a 200 Status OK when using POST or PUT.
my ($buffer);
# Read in text
$ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
if ($ENV{'REQUEST_METHOD'} eq "GET")
{
$buffer = $ENV{'QUERY_STRING'};
}
else
{
$cgi->$header->status('405 Method Not Allowed')
print $cgi->header('text/plain');
}
I am still new to CGI programming so I figured someone here could toss me a bone about working with CGI and HTTP status returns. If a good CGI doc is provided that would be awesome, as most returned by search are CPAN (already read a few times) and really old tutorials that are not Object oriented.

cpan docs is more than enought for CGI. If you want new tutorials don't use CGI, use one of MVC frameworks ( Catalyst, Dancer2, Mojo, etc ).
You can post 405 header if will change:
$cgi->$header->status('405 Method Not Allowed');
print $cgi->header('text/plain');
to this:
print $cgi->header(
-type=>'text/plain',
-status=> '405 Method Not Allowed'
);

Related

Getting the Error 403 Forbidden while posting the website address through Perl

use strict;
use LWP::UserAgent;
my $UserAgent = LWP::UserAgent->new;
my $response = $UserAgent->get("https://scholar.google.co.in/scholar_lookup?author=N.+R.+Alpert&author=S.+A.+Mohiddin&author=D.+Tripodi&author=J.+Jacobson-Hatzell&author=K.+Vaughn-Whitley&author=C.+Brosseau+&publication_year=2005&title=Molecular+and+phenotypic+effects+of+heterozygous,+homozygous,+and+compound+heterozygote+myosin+heavy-chain+mutations&journal=Am.+J.+Physiol.+Heart+Circ.+Physiol.&volume=288&pages=H1097-H1102");
if ($response->is_success)
{
$response->content =~ /<title>(.*?) - Google Scholar<\/title>/;
print $1;
}
else
{
die $response->status_line;
}
I am getting the below error while running this script.
403 Forbidden at D:\Getelement.pl line 52.
I have pasted this website address in address bar, and its redirecting exactly to that site, but its not working in while running by script.
Can you please help me on this issue.
Google Terms of Service disallow automated searches. They are
detecting you're sending this from a script because your headers and
your browser standard headers are very different, and you can analyze
them if you want.
In the old times they had a SOAP API, and you could use modules like
WWW::Search::Google but that's not the case anymore because this
API was deprecated.
Alternatives were already discussed in the following StackOverflow
question:
What are the alternatives now that the Google web search API has
been deprecated?
Google has blacklisted LWP::UserAgent They either blacklisted the UserAgent or parts of the request (headers whatsoever).
I suggest you use Mojo::UserAgent.. The request looks like by default more like a browser. You must write minimum 1 lines of code.
use Mojo::UserAgent;
use strict;
use warnings;
print Mojo::UserAgent->new->get('https://scholar.google.co.in/scholar_lookup?author=N.+R.+Alpert&author=S.+A.+Mohiddin&author=D.+Tripodi&author=J.+Jacobson-Hatzell&author=K.+Vaughn-Whitley&author=C.+Brosseau+&publication_year=2005&title=Molecular+and+phenotypic+effects+of+heterozygous,+homozygous,+and+compound+heterozygote+myosin+heavy-chain+mutations&journal=Am.+J.+Physiol.+Heart+Circ.+Physiol.&volume=288&pages=H1097-H1102')->res->dom->at('title')->text;
# Prints Molecular and phenotypic effects of heterozygous, homozygous, and
# compound heterozygote myosin heavy-chain mutations - Google Scholar
Terms
The code does not accept any terms nor additional lines has been added to bypass security checks. It's absolutely fine.
You can fetch your content if you add a User Agent string to identify yourself to the web server:
...
my $UserAgent = LWP::UserAgent-new;
$UserAgent->agent('Mozilla/5.0'); #...add this...
...
print $1;
...
This prints: "Molecular and phenotypic effects of heterozygous, homozygous, and compound heterozygote myosin heavy-chain mutations"

how to know which http action called from rest client at server side

When a request was made(actions like GET POST PATCH) to server through a rest client like LWP or REST::Client or HTTP::Request. how can we decode the request so that we will get the actual method which is called from client. if we can get the action we will process or respond to client accordingly.
this way i am able to get headers, all params sent in post request.
my $q = CGI->new;
my $input = $q->param( 'POSTDATA' ); # for content
my %headers = map { $_ => $q->http($_) } $q->http();
print $q->header('text/plain');
print "Got the following headers:\n";
for my $header ( keys %headers ) {
print "$header: $headers{$header}\n";
}
Now my question is how to receive the action like GET or POST.
From the docs
request_method()
Returns the method used to access your script, usually one of 'POST', 'GET' or 'HEAD'.
Also from the docs:
CGI.pm is no longer considered good practice for developing web applications, including quick prototyping and small web scripts. There are far better, cleaner, quicker, easier, safer, more scalable, more extensible, more modern alternatives available at this point in time. These will be documented with CGI::Alternatives.

Better way to proxy an HTTP request using Perl HTTP::Response and LWP?

I need a Perl CGI script that fetches a URL and then returns the result of the fetch - the status, headers and content - unaltered to the CGI environment so that the "proxied" URL is returned by the web server to the user's browser as if they'd accessed the URL directly.
I'm running my script from cgi-bin in an Apache web server on an Ubuntu 14.04 host, but this question should be independent of server platform - anything that can run Perl CGI scripts should be able to do it.
I've tried using LWP::UserAgent::request() and I've got very close. It returns an HTTP::Response object that contains the status code, headers and content, and even has an "as_string" method that turns it into a human-readable form. The problem from a CGI perspective is that "as string" converts the status code to "HTTP/1.1 200 OK" rather than "Status: 200 OK", so the Apache server doesn't recognise the output as a valid CGI response.
I can fix this by using other methods in HTTP::Response to split out the various parts, but there seems to be no public way of getting at the encapsulated HTTP::Headers object in order to call its as_string method; instead I have to hack into the Perl blessed object hash and yank out the private "_headers" member directly. To me this seems slightly evil, so is there a better way?
Here's some code to illustrate the above. If you put it in your cgi-bin directory then you can call it as
http://localhost/cgi-bin/lwp-test?url=http://localhost/&http-response=1&show=1
You can use a different URL for testing if you want. If you set http-response=0 (or drop the param altogether) then you get the working piece-by-piece solution. If you set show=0 (or drop it) then the proxied request is returned by the script. Apache will return the proxied page if you have http-response=0 and will choke with a 500 Internal Server Error if it's 1.
#!/usr/bin/perl
use strict;
use warnings;
use CGI::Simple;
use HTTP::Request;
use HTTP::Response;
use LWP::UserAgent;
my $q = CGI::Simple->new();
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => $q->param('url'));
my $res = $ua->request($req);
# print a text/plain header if called with "show=1" in the query string
# so proxied URL response is shown in browser, otherwise just output
# the proxied response as if it was ours.
if ($q->param('show')) {
print $q->header("text/plain");
print "\n";
}
if ($q->param('http-response')) {
# This prints the status as "HTTP/1.1 200 OK", not "Status: 200 OK".
print $res->as_string;
} else {
# This works correctly as a proxy, but using {_headers} to get at
# the private encapsulated HTTP:Response object seems a bit evil.
# There must be a better way!
print "Status: ", $res->status_line, "\n";
print $res->{_headers}->as_string;
print "\n";
print $res->content;
}
Please bear in mind that this script was written purely to demonstrate how to forward an HTTP::Response object to the CGI environment and bears no resemblance to my actual application.
You can go around the internals of the response object at $res->{_headers} by using the $res->headers method, that returns the actual HTTP::Headers instance that is used. HTTP::Response inherits that from HTTP::Message.
It would then look like this:
print "Status: ", $res->status_line, "\n";
print $res->headers->as_string;
That looks less evil, though it's still not pretty.
As simbabque pointed out, HTTP::Response has a headers method through inheritance from HTTP::Message. We can tidy up the handling of the status code by using HTTP::Response->header to push it into the embedded HTTP::Headers object, then use headers_as_string to print out the headers more cleanly. Here's the final script:-
#!/usr/bin/perl
use strict;
use warnings;
use CGI::Simple;
use HTTP::Request;
use HTTP::Response;
use LWP::UserAgent;
my $q = CGI::Simple->new();
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => $q->param('url'));
my $res = $ua->request($req);
# print a text/plain header if called with "show=1" in the query string
# so proxied URL response is shown in browser, otherwise just output
# the proxied response as if it was ours.
if ($q->param('show')) {
print $q->header("text/plain");
}
# $res->as_string returns the status in a "HTTP/1.1 200 OK" line rather than
# a "Status: 200 OK" header field so it can't be used for a CGI response.
# We therefore have a little more work to do...
# convert status from line to header field
$res->header("Status", $res->status_line);
# now print headers and content - don't forget a blank line between the two
print $res->headers_as_string, "\n", $res->content;

CGI Perl Echo back POSTed application/x-www-form-urlencoded Content

I need a simple CGI based Perl script to receive a POST (directly, not from another HTML page) with Content-Type being application/x-www-form-urlencoded and to echo back
I received: (encoded string)
(and if possible)
decoded, the string is: (decoded string)
I am new to CGI Perl, and this is a one off request for testing a product (I'm a sysadmin. not a programmer). I intend to learn Perl more deeply in the future, but in this case I'm hoping for a gimme.
To start off, I will quickly skim some of the basics.
Following is the package for PERL/CGI application:
use CGI;
To create CGI object:
my $web = CGI->new;
Make sure you set and then write HTTP headers to outstream, before flushing out any CGI data to outstream. Otherwise you would end up in 500 error.
To set the headers:
print $web->header();
print $web->header('application/x-www-form-urlencoded');
To receive any post data from HTML, say for example,
http://example.com?POSTDATA=helloworld
you may use param() function:
my $data = $web->param('POSTDATA');
scalar $data would be set with "helloworld".
It is advisable to check if $web->param('POSTDATA') is defined before you assign to a scalar.

Download only new/modified files with perl (or wget)

I have a Perl script which downloads a large number of files from a remote server. I'd like to avoid hammering the server, so I'd like to avoid downloading a file if it hasn't been modified since my last check. Is there a good way to do this, either in Perl or with a shell script?
Can I get the server to send HTTP 304 rather than HTTP 200 for unmodified files?
Yes, use LWP::UserAgent and pay special attention to the mirror method. This is also available in the procedural LWP::Simple as the mirror function.
From LWP's POD:
This method will get the document identified by $url and store it in file called $filename. If the file already exists, then the request will contain an "If-Modified-Since" header matching the modification time of the file. If the document on the server has not changed since this time, then nothing happens. If the document has been updated, it will be downloaded again. The modification time of the file will be forced to match that of the server.
The return value is the the response object.
HTTP 304 is the response code the server will return if you pass the If-Modified-Since test and your copy is fresh. LWP does this internally with mirror -- you needn't worry about it.
This is based on Evan Carrol's answer, but I'm going to elaborate in case this is useful for someone else. I stubbed out the response section; I doubt that part of my code will be interesting.
#!/usr/bin/perl -w
require HTTP::Date;
require LWP::UserAgent;
require Date::Parse;
my $lastChecked = '2009-01-01';
my $ua = LWP::UserAgent->new;
$ua->default_header('If-Modified-Since' => HTTP::Date::time2str(Date::Parse::str2time($lastChecked)));
my $response = $ua->get('http://example.com/');
if ($response->code == 304) {
print "No changes.\n";
} elsif ($response->is_success) {
print $response->decoded_content;
} else {
print "Response was error " . $response->code . ": '" . $response->status_line . "'\n";
}