mod_perl redirect - redirect

So I'm working in a mod_perl environment, and I want to know what the best way is to redirect to a new url. I know in CGI Perl you use print "Location:...", however I've come to find that usually there are better ways to do things in mod_perl, but I can't seem to find anything. Thanks in advance!

use Apache2::Const -compile => qw(REDIRECT);
sub handler {
my $r = shift;
$r->headers_out->set( Location => $url);
$r->status(Apache2::Const::REDIRECT); #302
}
This is the answer for how to properly redirect in mod_perl2

Related

Perl - mechanize

I have the following code that works just fine.
#!/usr/bin/perl -w
use strict;
use LWP 6.03;
use URI;
my $browser=LWP::UserAgent->new;
my $url=URI->new ( 'http://www.google.com/search');
$url->query_form(
'h1'=>'en',
'num'=>'100',
'q'=>'glass',
);
my $response=$browser->get($url,
'User-Agent' => 'Mozilla/4.76 [en] (win98; U)',
'Accept' => 'image/gif, image/x-bitmap, image/jpeg, image/pjpeg,image/png,*/*',
'Accept-Charset' => 'iso-8859-1,*',
'Accept-Language' => 'en-US',
);
if ($response->content=~m/glass/i){
print "Success";
open (GGLASS,">gglass");
print GGLASS $response->content;
} else {
print "complete failure";
}
I have another piece of code that also works fine.
It uses the following:
#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;
use HTML::TokeParser;
When I look up the documentation for my code at cpan it tells me that the libraries I am using are deprecated. Even though it works with my system, the style of programming is being abandoned. It references me to something I have never used and I do not know if that is soon to be abandoned as well. What is the popular way to scrape a website. I do not want to be considered a dinosaur or be stuck with antiquated or remedial programs and tactics that leave me in the previous century. If you could come up with a piece of code that is similar to the first example that would be nice. This way I could compare the two.
Your documentation is wrong. Neither one of LWP, URI, WWW::Mechanize, HTML::TokeParser is deprecated. Mechanize works just fine in general for crawling. I would replace HTML::TokeParser with something that handles HTML parsing in a declarative fashion, though - Web::Query is splendid, HTML::TreeBuilder::XPath is nice.
However, concerning your code example: Google's terms of use forbid scraping. Use their API instead!

How to intercept HTTP Request in Perl?

I am developing a web application and I am wondering if there is a way or some kind of module that can intercept or handle any HTTP Request before being displayed to the client. I can't modify the httpd.conf file so it isn't an option.
I want to do this for adding some security to my web app by denying the access or redirecting to other pages or by modifying the response sent to the client and some other stuff.
I've also heard about Request Dispatching and maybe it could help me.
Anybody knows how to achieve this?
Perhaps you can use Plack::Handler::Apache2 to enable PSGI support. From there, you can PSGI Middleware modules to modify both the request and response.
It's hard to get more specific without knowing how you've setup Perl to be executed in your mod_perl environment.
You may want to check out HTTP::Proxy, a perl module to write a web proxy... An example of this:
# alternate initialisation
my $proxy = HTTP::Proxy->new;
$proxy->port( 3128 ); # the classical accessors are here!
# this is a MainLoop-like method
$proxy->start;
my $d = HTTP::Daemon->new || die;
while (my $c = $d->accept) {
while (my $r = $c->get_request) {
if ($r->method eq 'GET' and $r->uri->path eq "/xyzzy") {
# remember, this is *not* recommended practice :-)
$c->send_file_response("/home/hd1/.zshrc");
}
else {
$c->send_error(RC_FORBIDDEN)
}
}
$c->close;
undef($c);
}

Site scraping in perl using WWW::Mechanize

I have used WWW::Mechanize in perl for site scraping application.
I have faced some difficulties when I'm going to login to particular site via WWW::Mechanize. I have gone through some examples of WWW::Mechanize. But i couldn't find out my issue.
I have mention below my code.
#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;
use HTTP::Cookies;
use Crypt::SSLeay;
my $agent = WWW::Mechanize->new(noproxy => 0);
$agent->cookie_jar(HTTP::Cookies->new());
$agent->agent('Mozilla/5.0');
$agent->proxy(['https', 'http', 'ftp'], 'http://proxy.rcapl.com:3128');
$agent->get("http://www.facebook.com");
my $re=$agent->submit_form(
form_number => 1,
fields => {
Email => 'xyz#gmail.com',
Passwd =>'xyz'
}
);
print $re->content();
When i run the code it says:
Error POSTing https://www.facebook.com/login.php?login_attempt=1: Not Implemented at ./test.pl line 11
Can anybody tell what's going wrong on code. Do i need to set all the parameters which facebook send for login?.
The proxy is faulty:
Error GETing http://www.facebook.com: Can't connect to proxy.rcapl.com:3128 (Bad hostname) at so11406791.pl line 11.
The program works for me without calling the proxy method. Remove this.

Simple Perl Proxy

We store a large amount of files on Amazon S3 that we want website visitors to be able to access via AJAX but we don't want the actual file locations disclosed to visitors.
To accomplish this what I'm hoping to do is to make an AJAX request to a very simple perl script that would simply act as a proxy and return the file to the browser. I already have the script setup to authenticate that the user is logged in and do a database query to figure out the correct url to access the file on S3 but I'm not sure the best way to return the file to the vistor's browser in the most efficient manner.
Any suggestions on the best way to accomplish this would be greatly appreciated. Thanks!
The best way is to use the sendfile system call. If you're opening and reading the file from disk manually and then again write it blockwise to the "sink" end of your Web framework, then you're very wasteful because the data have to travel through the RAM, possibly including buffering.
What you describe in your question is a very common pattern, therefore many solutions already exist around the idea of just setting a special HTTP header, then letting the Web stack below your application deal with it efficiently.
mod_xsendfile for Apache httpd
in lighttpd
X-Accel-Redirect for nginx
Employ the XSendfile middleware in Plack to set the appropriate header. The following minimal program will DTRT and take advantage of the system call where possible.
use IO::File::WithPath qw();
use Plack::Builder qw(builder enable);
builder {
enable 'Plack::Middleware::XSendfile';
sub {
return [200, [], IO::File::WithPath->new('/usr/src/linux/COPYING')];
}
};
Ok. There's example how to implement this using Mojolicious framework.
I suppose you run this script as daemon. Script catches all requests to /json_dir/.*, this request to Stackoverflow API and returns response.
You may run this script as ./example.pl daemon and then try http://127.0.0.1:3000/json_dir/perl
In response you should be able to find your own question titled 'Simple Perl Proxy'.
This code could be used as standalone daemon that listen on certain port and as CGI script (first preferred).
#!/usr/bin/env perl
use Mojolicious::Lite;
get '/json_dir/(.filename)' => sub {
my $self = shift;
my $filename = $self->stash('filename');
my $url = "http://api.stackoverflow.com/1.1/questions?tagged=" . $filename;
$self->ua->get(
$url => sub {
my ($client, $tx) = #_;
json_response($self, $tx);
}
);
$self->render_later;
};
sub json_response {
my ($self, $tx) = #_;
if (my $res = $tx->success) {
$self->tx->res($res);
}
else {
$self->render_not_found;
}
$self->rendered;
}
app->start;
__DATA__
## not_found.html.ep
<!doctype html><html>
<head><title>Not Found</title></head>
<body>File not found</body>
</html>

Downloading or requesting a page using a proxy list?

I was wondering if that was possible to request an internet page from its server via a proxy taken from a proxy list.
I don't really know all the exact terms, so I'll just explain what I want: say there is a feature in a website which counts IPs or something alike (perhaps cookies), such as visitors counter. I'd like to "fool" it by "entering" the page using many proxies.
I could use something like Tor, but that's too much work - I only want to visit a page, let the counter or whatever in the page know that I visited, and that's all.
I don't really know which tags to add, but I had some little experiments with Perl so I think that could be a good direction, although I couldn't find a solution for my problem.
Thank you in advance.
You want something like this:
#/usr/bin/perl
use strict; use warnings;
use LWP::UserAgent;
my $url = shift || 'http://www.google.com';
my $a = LWP::UserAgent->new;
$a->agent('Mozilla/5.0');
$a->timeout(20);
while (<DATA>) {
$a->proxy( ['http'], $_ );
warn "Failed to get page with proxy $_\n"
unless $a->get( $url )->is_success;
}
__DATA__
http://85.214.142.3:8080
http://109.230.245.167:80
http://211.222.204.1:80
The code doesn't require much explanations. LWP::UserAgent allows specifying a proxy server.
Loop through a list of proxies, get the wanted page and you're done.