Scrape from .onion site using Web::Scraper - perl

Problem: Scrape from tor .onion site using Web::Scraper
I would like to modify my code to connect to .onion site. I believe I need to connect to the SOCKS5 proxy, but unsure of how to do it with Web::Scraper
Existing code:
use Web::Scraper;
my $piratelink=$PIRATEBAYSERVER.'/search/' . $srstring . '%20'. 's'.$sval[1].'e'.$epinum.'/0/7/0';
my $purlToScrape = $piratelink;
my $ns = scraper {
process "td>a", 'mag[]' => '#href';
process "td>div>a", 'tor[]' => '#href';
process "td font.detDesc", 'sizerow[]' => 'TEXT';
};
my $mres = $ns->scrape(URI->new($purlToScrape));

Web::Scraper uses LWP if you pass a URI to scrape.
You can either fetch the HTML using some other HTTP library that uses SOCKS, or using the shared UserAgent variable from Web::Scraper, you can set up LWP to use SOCKS and pass that as the agent.
use strict;
use LWP::UserAgent;
use Web::Scraper;
# set up a LWP object with Tor socks address
my $ua = LWP::UserAgent->new(
agent => q{Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; YPC 3.2.0; .NET CLR 1.1.4322)},
);
$ua->proxy([qw/ http https /] => 'socks://localhost:9050'); # Tor proxy
$ua->cookie_jar({});
my $PIRATEBAYSERVER = 'http://uj3wazyk5u4hnvtk.onion';
my $srstring = 'photoshop';
my $piratelink=$PIRATEBAYSERVER.'/search/' . $srstring; # . '%20'. 's'.$sval[1].'e'.$epinum.'/0/7/0';
my $purlToScrape = $piratelink;
my $ns = scraper {
process "td>a", 'mag[]' => '#href';
process "td>div>a", 'tor[]' => '#href';
process "td font.detDesc", 'sizerow[]' => 'TEXT';
};
# override Scraper's UserAgent with our SOCKS LWP object
$Web::Scraper::UserAgent = $ua;
my $mres = $ns->scrape(URI->new($purlToScrape));
print $mres;
Note, you will also need to install the CPAN module LWP::Protocol::socks

Related

LWP::UserAgent unable to establish 'keep-alive' connection

I am using the LWP::UserAgent to connect to fetch some connection
use LWP::UserAgent;
use LWP::ConnCache;
use LWP::Debug qw(+);
my $ua = LWP::UserAgent->new( conn_cache => 1);
my $cache = $ua->conn_cache(LWP::ConnCache->new( ));
$ua->conn_cache->total_capacity(undef);
$ua->cookie_jar({});
$ua->agent('Mozilla/5.0');
$ua->add_handler("request_send", sub { shift->dump; return });
push #{$ua->requests_redirectable}, 'GET';
$page = $ua->get('https://www.foo.com');
I tested the script its unable and checked the Requested Header. It does not have any the below key- value pairs.
Keep-Alive 115
Connection keep-alive
valuable input required.
I believe you just need to specify LWP::UserAgent->new(keep_alive => $maxrequests) to enable keepalive. It will automatically set up the connection cache for you.
I don't see a way to make the number unlimited, though.

HTTP Basic Authentication in Asana with perl

I'm trying to use Asana API with HTTP Basic Auth. The following program prints
{"errors":[{"message":"Not Authorized"}]}
It seems that LWP doesn't send the auth credentials to the server.
#!/usr/bin/perl
use v5.14.0;
use LWP;
my $ua = new LWP::UserAgent;
$ua->credentials('app.asana.com:443', 'realm', 'api_key_goes_here' => '');
my $res = $ua->get("https://app.asana.com/api/1.0/users/me");
say $res->decoded_content;
I've run into something similar (on a completely different service), and couldn't get it working. I think it's to do with a realm/hostname mismatch.
As you note - if you hit that URL directly, from a web browser, you get the same answer (without an auth prompt).
But what I ended up doing instead:
my $request = HTTP::Request -> new ( 'GET' => 'https://path/to/surl' );
$request -> authorization_basic ( 'username', 'password' );
my $results = $user_agent -> request ( $request );

Cannot login with UserAgent

I have managed to login with the code below. Now I can do it ony once a day.
And then I cant login, but get the login page in the response.
But when i print $reqstr from the code below and paste it to browser(like firefox), I can log in.
Wget doesnt work neiter. Only normal browser.
Soemtimes it seems , that Im logged in, but only get such content:
"<html>\cJ<head>\cJ\cI<meta http-equiv=\"content-type\" content=\"text/html; charset=ISO-8859-1\"><meta http-equiv=\"expires\" content=\"0\"><meta http-equiv=\"pragma\" content=\"no-cache\">\cJ\cI<meta http-equiv=\"refresh\" content=\"0; URL='https://www.address.com/'\">\cJ</head>\cJ</html>\cJ"
I also noticed, that while I cant login, Im getting this part in a debugger:
_uri_canonical' => URI::https=SCALAR(0x17dad28)
-> REUSED_ADDRESS
'handlers' => HASH(0x22dc0c0)
'response_data' => ARRAY(0x22ee8b8)
0 HASH(0x22d9a48)
'callback' => CODE(0x22dba30)
-> &LWP::UserAgent::__ANON__[/usr/lib/perl5/vendor_perl/5.10.0/LWP/UserAgent.pm:682] in /usr/lib/perl5/vendor_perl/5.10.0/LWP/UserAgent.pm:679-682
1 HASH(0x22eea08)
'callback' => CODE(0x22d9cb8)
-> &LWP::Protocol::__ANON__[/usr/lib/perl5/vendor_perl/5.10.0/LWP/Protocol.pm:138] in /usr/lib/perl5/vendor_perl/5.10.0/LWP/Protocol.pm:135-138
Any clue?
Here the code:
my $b = LWP::UserAgent->new(agent => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.5) Gecko/20060719 Firefox/31.2.0',);
my $cookie_jar = HTTP::Cookies->new(
file => 'lwp_cookies.txt',
autosave => 1,
ignore_discard => 1,
);
$cookie_jar->clear;
$cookie_jar->clear_temporary_cookies;
$b->cookie_jar($cookie_jar);
my $url = "https://www.address.com";
my $r = $b->get($url);
$r->decoded_content =~ /FORM ACTION="(.*?)" METHOD/msgi;
my $a = "$url$1";
print $a."\n";
my $reqstr = $a."&LoginAction=Login&Number=55555&KPassword=passw&UserID=uid";
my $req = HTTP::Request->new(POST => $reqstr);
$req->header('Host', 'www.address.com');
$req->header('User-Agent', 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0');
$req->header('Connection', 'keep-alive');
$req->header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8');
my $c = $b->request($req);
You need to re-request that page with the referrer added via referer() for LWP::UserAgent (or see my second answer if you aren't wedded to that module)
sub login { # Code not tested and not really compilable, just a stub for you
my (#other_args, $url, $referrer_url) = #_;
# Add your login code from the question, up to calling $b->request()
$req->referer($referrer_url) if $referrer_url;
my $c = $b->request($req);
return $c; # Or return the response?
}
my $result1 = login($original_login_url); #first try
# Obtain the redirect_url from the response.
# If it was a 301 redirect, you can do it via
# my #redirects = $response->redirects();
my $referrer_url = $original_login_url;
my $result2 = login($redirect_url, $referrer_url);
References:
http://forums.devshed.com/perl-programming-6/lwp-meta-refresh-tag-handling-63484.html
http://www.herongyang.com/Perl/LWP-UserAgent-Follow-HTTP-Redirects.html
If you aren't dead set on using LWP::UserAgent, use WWW::Mechanize instead.
Best approach: use WWW::Mechanize::Plugin::FollowMetaRedirect. The SYNOPSIS is pretty short and to the point:
use WWW::Mechanize;
use WWW::Mechanize::Plugin::FollowMetaRedirect;
my $mech = WWW::Mechanize->new;
$mech->get( $url );
$mech->follow_meta_redirect;
# Optionally, skip emulating the waiting time
$mech->follow_meta_redirect( ignore_wait => 1 );
If you don't have access to that module, you can create your own, similar to this: http://www.perlmonks.org/?node_id=487286
(Basically, parse the returned content using the regex shudder to extract the refresh URL, and get that URL. As per my other answer, you might need to add the referrer header)

Facebook Api callback not working

#!/usr/bin/perl -w
use WWW::Facebook::API;
use WWW::Facebook::API::Auth;
use WWW::Facebook::API::Canvas;
use HTTP::Request;
use LWP;
use CGI; # load CGI routines
$q = CGI->new; # create new CGI object
print $q->header, # create the HTTP header
$q->start_html('Facebook App'), # start the HTML
$q->h1('Facebook Authentication'), # level 1 header
$q->end_html; # end the HTML
my $facebook_api = '-------------';
my $facebook_secret = '----------------';
my $facebook_clientid = '---------------------';
my $client = WWW::Facebook::API->new(
desktop => 0,
api_version => '1.0',
api_key => $facebook_api,
secret => $facebook_secret,
callback => 'http://localhost/perl/facebook.pl',
);
$client->app_id($facebook_clientid);
$q->redirect($client->get_login_url());
Afterlogin to facebook the callback url is not working getting the facebook appication is underconstruction. I dont want to specify the call back url in facebook itself. I want to specify the callback in the source code.
callback => 'http://localhost/perl/facebook.pl',
Facebook can't make the callback request to http://localhost - your localhost is not their localhost!
You have to use a public facing URL.

Perl ssl client auth using certificate

I'm having trouble getting the following code to work and at a point where I am stuck. I am trying to perform client side authentication using a certificate during a POST request. I'm only interested in sending the client cert to the server and don't really need to check the server certificate.
Here is the cUrl command that trying to replicate:
curl --cacert caCertificate.pem --cert clientCerticate.pem -d "string" https://xx.xx.xx.xx:8443/postRf
I keep getting the following error in my Perl script:
ssl handshake failure
I guess I have two questions: what should I be pointing to for CRT7 AND KEY8 variables? and is this the best way to send a POST request using client cert authentication?
!/usr/bin/perl
use warnings;
use strict;
use Net::SSLeay qw(post_https);
my $$hostIp = "xx.xx.xx.xx"
my $hostPort = "8443"
my $postCommand = "/postRf/string";
my $http_method = 'plain/text';
my $path_to_crt7 = 'pathToCert.pem';
my $path_to_key8 = 'pathToKey.pem';
my ($page, $response, %reply_headers) =
post_https($hostIp, $hostPort, $postCommand, '',
$http_method, $path_to_crt7, $path_to_key8
);
print $page . "\n";
print $response . "\n";
See LWP::UserAgent and IO::Socket::SSL.
use strictures;
use LWP::UserAgent qw();
require LWP::Protocol::https;
my $ua = LWP::UserAgent->new;
$ua->ssl_opts(
SSL_ca_file => 'caCertificate.pem',
SSL_cert_file => 'clientCerticate.pem',
);
$ua->post(
'https://xx.xx.xx.xx:8443/postRf',
Content => 'string',
);
I haven't tested this code.