Downloading or requesting a page using a proxy list? - perl

I was wondering if that was possible to request an internet page from its server via a proxy taken from a proxy list.
I don't really know all the exact terms, so I'll just explain what I want: say there is a feature in a website which counts IPs or something alike (perhaps cookies), such as visitors counter. I'd like to "fool" it by "entering" the page using many proxies.
I could use something like Tor, but that's too much work - I only want to visit a page, let the counter or whatever in the page know that I visited, and that's all.
I don't really know which tags to add, but I had some little experiments with Perl so I think that could be a good direction, although I couldn't find a solution for my problem.
Thank you in advance.

You want something like this:
#/usr/bin/perl
use strict; use warnings;
use LWP::UserAgent;
my $url = shift || 'http://www.google.com';
my $a = LWP::UserAgent->new;
$a->agent('Mozilla/5.0');
$a->timeout(20);
while (<DATA>) {
$a->proxy( ['http'], $_ );
warn "Failed to get page with proxy $_\n"
unless $a->get( $url )->is_success;
}
__DATA__
http://85.214.142.3:8080
http://109.230.245.167:80
http://211.222.204.1:80
The code doesn't require much explanations. LWP::UserAgent allows specifying a proxy server.
Loop through a list of proxies, get the wanted page and you're done.

Related

Asynchronous Chat Server using Mojolicious

Hello Ladies and Gentleman! I am currently writing a minimalistic chat server that will somewhat resemble IRC. I am writing it in perl using the Mojolicious, but unfortunately have run into an issue. I have the following code:
#!/usr/bin/perl
use warnings;
use strict;
use Mojo::IOLoop::Server;
my $server = Mojo::IOLoop::Server->new;
$server->on(accept => sub {
my ($server, $handle) = #_;
my $data;
print $handle "Connected!\n";
while(1) {
$handle->recv($data, 4096);
if($data) {
print $server "$data";
}
}
});
$server->listen(port => $ARGV[0]);
$server->start;
$server->reactor->start unless $server->reactor->is_running;
Unfortunately, the print $server "$data"; line does not actually work. It gives off the error:
Mojo::Reactor::Poll: I/O watcher failed: Not a GLOB reference at ./server.pl line 20.
I have looked through the documentation for Mojolicious, but cannot find how to send the line I get from client A to the rest of the clients connected.
While $handle is something like a stream you can write on, $server is a Mojo::IOloop::Server object, so it's not a surprise you can't write on it like you're trying to do.
Even if I use Mojolicious quite often, I'm not familiar with every possibilities (there are a lot), but here what I would suggest : you need to store a list of all connected clients (in a hash or an array for instance), and when you receive a message, you iterate through that client list, to send the message to all of them.
You also need a way (not hard to do) to delete clients from your clients list when they disconnect.
Also I'm not quite sure about your infinite loop : I wouldn't be surprised if it was blocking the server on the 1st connected client.
It's better to use Mojolicious functions to do so :
$serv->on(message => sub { send the message to all clients });
And that function would be called every time a message is received.
Here is a good example, using Mojolicious::Light, pretty easy to understand I think : https://github.com/kraih/mojo/wiki/Writing-websocket-chat-using-Mojolicious-Lite

LWP getstore usage

I'm pretty new to Perl. While I just created a simple scripts to retrieve a file with
getstore($url, $file);
But how do I know whether the task is done correctly or the connection interrupted in the middle, or authentication failed, or whatever response. I searched all the web and I found some, like a response list, and some talking about useragent stuff, which I totally can't understand, especially the operator $ua->.
What I wish is to an explanation about that operator stuff (I don't even know what -> used for), and the RC code meaning, and finally, how to use it.
Its a lot of stuff so I appreciate any answer given, even just partially. And, thanks first for whoever will to help. =)
The LWP::Simple module is just that: quite simplistic. The documentation states that the getstore function returns the HTTP status code which we can save into a variable. There are also the is_success and is_error functions that tell us whether a certain return value is ok or not.
my $url = "http://www.example.com/";
my $filename = "some-file.html";
my $rc = getstore($url, $filename)
if (is_error($rc)) {
die "getstore of <$url> failed with $rc";
}
Of course, this doesn't catch errors with the file system.
The die throws a fatal exception that terminates the execution of your script and displays itself on the terminal. If you don't want to abort execution use warn.
The LWP::Simple functions provide high-level controls for common tasks. If you need more control over the requests, you have to manually create an LWP::UserAgent. An user agent (abbreviated ua) is a browser-like object that can make requests to servers. We have very detailed control over these requests, and can even modify the exact header fields.
The -> operator is a general dereference operator, which you'll use a lot when you need complex data structures. It is also used for method calls in object-oriented programming:
$object->method(#args);
would call the method on $object with the #args. We can also call methods on class names. To create a new object, usually the new method is used on the class name:
my $object = The::Class->new();
Methods are just like functions, except that you leave it to the class of the object to figure out which function exactly will be called.
The normal workflow with LWP::UserAgent looks like this:
use LWP::UserAgent; # load the class
my $ua = LWP::UserAgent->new();
We can also provide named arguments to the new method. Because these UA objects are robots, it is considered good manners to tell everybody who sent this Bot. We can do so with the from field:
my $ua = LWP::UserAgent->new(
from => 'ss-tangerine#example.com',
);
We could also change the timeout from the default three minutes. These options can also be set after we constructed a new $ua, so we can do
$ua->timeout(30); # half a minute
The $ua has methods for all the HTTP requests like get and post. To duplicate the behaviour of getstore, we first have to get the URL we are interested in:
my $url = "http://www.example.com/";
my $response = $ua->get($url);
The $response is an object too, and we can ask it whether it is_success:
$response->is_success or die $response->status_line;
So if execution flows past this statement, everything went fine. We can now access the content of the request. NB: use the decoded_content method, as this manages transfer encodings for us:
my $content = $response->decoded_content;
We can now print that to a file:
use autodie; # automatic error handling
open my $fh, ">", "some-file.html";
print {$fh} $content;
(when handling binary files on Windows: binmode $fh after opening the file, or use the ">:raw" open mode)
Done!
To learn about LWP::UserAgent, read the documentation. To learn about objects, read perlootut. You can also visit the perl tag on SO for some book suggestions.

Simple Perl Proxy

We store a large amount of files on Amazon S3 that we want website visitors to be able to access via AJAX but we don't want the actual file locations disclosed to visitors.
To accomplish this what I'm hoping to do is to make an AJAX request to a very simple perl script that would simply act as a proxy and return the file to the browser. I already have the script setup to authenticate that the user is logged in and do a database query to figure out the correct url to access the file on S3 but I'm not sure the best way to return the file to the vistor's browser in the most efficient manner.
Any suggestions on the best way to accomplish this would be greatly appreciated. Thanks!
The best way is to use the sendfile system call. If you're opening and reading the file from disk manually and then again write it blockwise to the "sink" end of your Web framework, then you're very wasteful because the data have to travel through the RAM, possibly including buffering.
What you describe in your question is a very common pattern, therefore many solutions already exist around the idea of just setting a special HTTP header, then letting the Web stack below your application deal with it efficiently.
mod_xsendfile for Apache httpd
in lighttpd
X-Accel-Redirect for nginx
Employ the XSendfile middleware in Plack to set the appropriate header. The following minimal program will DTRT and take advantage of the system call where possible.
use IO::File::WithPath qw();
use Plack::Builder qw(builder enable);
builder {
enable 'Plack::Middleware::XSendfile';
sub {
return [200, [], IO::File::WithPath->new('/usr/src/linux/COPYING')];
}
};
Ok. There's example how to implement this using Mojolicious framework.
I suppose you run this script as daemon. Script catches all requests to /json_dir/.*, this request to Stackoverflow API and returns response.
You may run this script as ./example.pl daemon and then try http://127.0.0.1:3000/json_dir/perl
In response you should be able to find your own question titled 'Simple Perl Proxy'.
This code could be used as standalone daemon that listen on certain port and as CGI script (first preferred).
#!/usr/bin/env perl
use Mojolicious::Lite;
get '/json_dir/(.filename)' => sub {
my $self = shift;
my $filename = $self->stash('filename');
my $url = "http://api.stackoverflow.com/1.1/questions?tagged=" . $filename;
$self->ua->get(
$url => sub {
my ($client, $tx) = #_;
json_response($self, $tx);
}
);
$self->render_later;
};
sub json_response {
my ($self, $tx) = #_;
if (my $res = $tx->success) {
$self->tx->res($res);
}
else {
$self->render_not_found;
}
$self->rendered;
}
app->start;
__DATA__
## not_found.html.ep
<!doctype html><html>
<head><title>Not Found</title></head>
<body>File not found</body>
</html>

Adding a custom login page to awstats

I am trying to setup a login page to be used with awstats, so that the content is only viewable by authenticated users.
Ideally, I would like to create my own login page, and if a user is not logged in when the visit the stats page, they are redirected to the login page. (Right now there is no authentication)
The problem is that I don't know how to implement this. I have tried googling this, but the only solutions I could find were to use .htaccess (which I would rather not use in this case if I don't have to)
Has anyone implemented something similar to this?
.htaccess is the right tool for this job, but if you insist, the ancient ancient ancient way
#!/usr/bin/perl --
use strict;
use warnings;
use CGI;
Main( #ARGV );
exit( 0 );
sub Main {
my ( $q ) = CGI->new;
if( $q->param('password') eq 'secret' ){
print ShowAWSTATS($q);
} else {
print ShowLoginForm($q);
}
}
where ShowLoginForm() prints a content header $q->header along with the html for a login form, and ShowAWSTATS prints a content header, and say, some html as provided by awstats.pl
Like Len Jaffe says, there is much much much more that needs to be done, so you want to use .htaccess (its either 3min with .htaccess or hours with anything else)

How can I fill in web forms with Perl?

I want to fill in a web form with Perl. I am having trouble finding out the correct syntax to accomplish this. As in, how do I go to the URL, select the form, fill in the form, and then press enter to be sure it has been submitted?
Something like WWW::Mechanize::FormFiller?
WWW::Mechanize and its friends are the way to go. There are several examples in Spidering Hacks, but you'll also find plenty more by googling for the module name.
Good luck, :)
Start with WWW::Mechanize::Shell:
perl -MWWW::Mechanize::Shell -e shell
get http://some/page
fillout
...
submit
Afterwards, type "script", and save generated code as something.pl - and that's about it. It's done.
Request the form's action URL with Net::HTTP or something (can't recall the exact module), and include the forms fields as a GET/POST parameter (whichever the form calls for).
HTML::Form works nicely, too.
The synopsis of the module is an excellent example:
use HTML::Form;
$form = HTML::Form->parse($html, $base_uri);
$form->value(query => "Perl");
use LWP::UserAgent;
$ua = LWP::UserAgent->new;
$response = $ua->request($form->click);