I try to use WWW::Mechanize in order to automate a session with GeoServer.
GeoServer comes with a REST API, which can be used with curl. But at the moment, it is impossible to create a datastore for ImageMosaicJDBC with the REST API, so i would like to add the new raster data source with a Perl script. it is based on WWW::Mechanize.
but it fails, with this message :
your session has expired.
The script is just below...
#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
use HTML::TreeBuilder;
use HTML::Tree;
use Getopt::Long;
use HTTP::Cookies;
my %CONF = (
username => 'admin',
password => 'geoserver',
);
GetOptions( \%CONF, "username=s", "password=s" ) or die "Bad options";
my $netloc = "193.55.67.151:8080";
my $url = "http://$netloc/geoserver/web/?wicket:bookmarkablePage=:org.geoserver.web.GeoServerLoginPage";
my $cookie_jar = HTTP::Cookies->new;
my $agent = WWW::Mechanize->new( cookie_jar => $cookie_jar );
$agent->agent('User-Agent=Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0');
# auth
$agent->get($url);
die $agent->res->status_line unless $agent->success;
$agent->set_fields(%CONF);
$agent->submit;
die $agent->res->status_line unless $agent->success;
# adding data store
$url = "http://$netloc/geoserver/web?wicket:bookmarkablePage=:org.geoserver.web.data.store.NewDataPage";
my $content = $agent->get($url);
die $agent->res->status_line unless $agent->success;
my $tree = HTML::Tree->new();
$tree->parse($content);
print $agent->content;
# storeform
$url = "http://$netloc/geoserver/web/?wicket:interface=:5:storeForm::IFormSubmitListener::";
my $content = $agent->post($url);
die $agent->res->status_line unless $agent->success;
my $tree = HTML::Tree->new();
$tree->parse($content);
print $agent->content;
# newdatapage
$url = "http://$netloc/geoserver/web/?wicket:interface=:6::::";
my $ref = "http://$netloc/geoserver/web/?wicket:bookmarkablePage=:org.geoserver.web.data.store.NewDataPage";
my $content = $agent->get( $url, referer => $ref);
die $agent->res->status_line unless $agent->success;
my $tree = HTML::Tree->new();
$tree->parse($content);
print $agent->content;
I cannot see where the problem comes from... In particular, i used WireShark to inspect the HTTP exchanges, but every thing was ok for me. The JSESSIONID cookie was for example correctly rescueing.
Try setting timeout parameter while declaring $agent
Related
I need to send requests to an HTTP server using LWP. For example, I have a file with data, and I must send requests to server foobar.baz.
use LWP::UserAgent;
$ua = LWP::UserAgent->new;
$ua->agent("$0/0.1 " . $ua->agent);
$ua->agent("Mozilla/8.0")
$req = HTTP::Request->new(GET => 'http://www.foobar.baz');
$req->header('Accept' => 'text/html');
$res = $ua->request($req);
How I can use file.txt in
$req = HTTP::Request->new(GET => 'http://www.foobar.baz')
for every request?
For example file.txt contains
aaaa
bbbb
cccc
dddd
eeee
I need to send a request to
aaaa.foobar.baz
bbbb.foobar.baz
cccc.foobar.baz
and so on.
How can I do it?
This is a very simple question, and I wonder why you can't even attempt it yourself
It's just a matter of reading the file and building the complete URL from each line of text
use strict;
use warnings 'all';
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent("$0/0.1 " . $ua->agent);
$ua->agent("Mozilla/8.0");
open my $fh, '<', 'file.txt' or die $!;
while ( <$fh> ) {
next unless /\S/;
chomp;
my $res = $ua->get( "$_.foobar.baz" );
}
You might find App::SimpleScan on CPAN to be useful. I wrote it for just such an application back at Yahoo! in 2005. It handles combinatorial specifications of URLs, lets you snapshot the output, etc. Plugin-based with a fairly good set of plugins, so if it won't do exactly what you want out of the box, it shouldn't be hard for you to make it work.
I've created a perl script to use HTML::TableExtract to scrape data from tables on a site.
It works great to dump out table data for unsecured sites (i.e. HTTP site), but when I try HTTPS sites, it doesn't work (the tables_report line just prints blank.. it should print a bunch of table data).
However, if I take the content of that HTTPS page, and save it to an html file and then post it on an unsecured HTTP site (and change my content to point to this HTTP page), this script works as expected.
Anyone know how I can get this to work over HTTPS?
#!/usr/bin/perl
use lib qw( ..);
use HTML::TableExtract;
use LWP::Simple;
use Data::Dumper;
# DOESN'T work:
my $content = get("https://datatables.net/");
# DOES work:
# my $content = get("http://www.w3schools.com/html/html_tables.asp");
my $te = HTML::TableExtract->new();
$te->parse($content);
print $te->tables_report(show_content=>1);
print "\n";
print "End\n";
The sites mentioned above for $content are just examples.. these aren't really the sites I'm extracting, but they work just like the site I'm really trying to scrape.
One option I guess is for me to use perl to download the page locally first and extract from there, but I'd rather not, if there's an easier way to do this (anyone that helps, please don't spend any crazy amount of time coming up with a complicated solution!).
The problem is related to the user agent that LWP::Simple uses, which is stopped at that site. Use LWP::UserAgent and set an allowed user agent, like this:
use strict;
use warnings;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $url = 'https://datatables.net/';
$ua->agent("Mozilla/5.0"); # set user agent
my $res = $ua->get($url); # send request
# check the outcome
if ($res->is_success) {
# ok -> I simply print the content in this example, you should parse it
print $res->decoded_content;
}
else {
# ko
print "Error: ", $res->status_line, "\n";
}
This is because datatables.net is blocking LWP::Simple requests. You can confirm this by using below code:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
print is_success(getprint("https://datatables.net/"));
Output:
$ perl test.pl
403 Forbidden <URL:https://datatables.net/>
You could try using LWP::RobotUA. Below code works fine for me.
#!/usr/bin/perl
use strict;
use warnings;
use LWP::RobotUA;
use HTML::TableExtract;
my $ua = LWP::RobotUA->new( 'bot_chankey/1.1', 'chankeypathak#stackoverflow.com' );
$ua->delay(5/60); # 5 second delay between requests
my $response = $ua->get('https://datatables.net/');
if ( $response->is_success ) {
my $te = HTML::TableExtract->new();
$te->parse($response->content);
print $te->tables_report(show_content=>1);
}
else {
die $response->status_line;
}
In the end, a combination of Miguel and Chankey's responses provided my solution. Miguel's made up most of my code, so I selected that as the answer, but here is my "final" code (got a lot more to do, but this is all I couldn't figure out.. the rest should be no problem).
I couldn't quite get either mentioned by Miguel/Chankey to work, but they got me 99% of the way.. then I just had to figure out how to get around the error "certificate verify failed". I found that answer with Miguel's method right away, so in the end, I mostly used his code, but both responses were great!
#!/usr/bin/perl
use lib qw( ..);
use strict;
use warnings;
use LWP::UserAgent;
use HTML::TableExtract;
use LWP::RobotUA;
use Data::Dumper;
my $ua = LWP::UserAgent->new(
ssl_opts => { SSL_verify_mode => 'SSL_VERIFY_PEER' },
);
my $url = 'https://WebsiteIUsedWasSomethingElse.com';
$ua->agent("Mozilla/5.0"); # set user agent
my $res = $ua->get($url); # send request
# check the outcome
if ($res->is_success)
{
my $te = HTML::TableExtract->new();
$te->parse($res->content);
print $te->tables_report(show_content=>1);
}
else {
# ko
print "Error: ", $res->status_line, "\n";
}
my $url = "https://ohsesfire01.summit.network/reports/slices";
my $user = 'xxxxxx';
my $pass = 'xxxxxx';
my $ua = new LWP::UserAgent;
my $request = new HTTP::Request GET=> $url;
# authenticate
$request->authorization_basic($user, $pass);
my $page = $ua->request($request);
I have a script that scrapes my scores from an website. This website uses my Facebook account for logging in. Since a few days the script does not work anymore and I get the following error:
Cookies Required:
Cookies are not enabled on your browser. Please enable cookies in your
browser preferences to continue.
I searched on this forum and I found a similar problem:
I tryed the given solution but it did not work.
I have export my FaceBook login cookies.txt from Google Chrome with this plugin. https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg?hl=en and save the cookies.txt in the same folder as the script:
use WWW::Mechanize;
use use HTTP::Cookies::Netscape;
my $mech = WWW::Mechanize->new( cookie_jar => HTTP::Cookies::Netscape->new( file => $cookies.txt ) );
$mech->agent_alias('Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Firefox/38.0');
$mech->get("https://www.facebook.com/login.php");
$mech->submit_form(
fields => {
email => '<my email here>',
pass => '<my password here>',
}
);
open($out, ">", "output_page.html") or die "Can't open output_page.html: $!";
print $out $mech->content;
In the script I use Agent: Firefox but the cookies are exported from Chrome, is this maybe an issue? I have tryed different methods and languages Python and perl.
Always use strict and use warnings at the top of your script
In your script This line is problem
my $mech = WWW::Mechanize->new( cookie_jar => HTTP::Cookies::Netscape->new( file => $cookies.txt ) );
Change it to
my $cookies= 'cookies.txt'; #your cookies file path here
my $mech = WWW::Mechanize->new( cookie_jar => HTTP::Cookies::Netscape->new( file => $cookies ) );
Correcting all errors your script should be:
#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
use HTTP::Cookies::Netscape; #you were using two times use
my $cookies='your cookies file path';
my $mech = WWW::Mechanize->new( cookie_jar => HTTP::Cookies::Netscape->new( file => $cookies) );
$mech->agent_alias('Windows Mozilla'); #this is right user agent
$mech->get("https://www.facebook.com/login.php");
$mech->submit_form(
fields => {
email =>'your username',
pass => 'your password',
}
);
open(my $out, ">", "output_page.html") or die "Can't open output_page.html: $!"; #make lexical variables using my
print $out $mech->content;
I will suggest you to read WWW::Mechanize documentation
Note: As many other users suggested this type of login using tools are against their ToS.So please check Facebook API.
There is no need to send credentials if you load correct cookies because this code works for me (as you can see you only need two cookies c_user and xs):
my $config = {
auth_token =>
"c_user=100009668980203;xs=143%3AfaO_5XiRk_rWOg%3A2%3A1435225463%3A-1",
};
my #cookies = split /;/, $config->{auth_token};
my $mech = WWW::Mechanize->new();
foreach my $cookie (#cookies) {
my ( $cookie_name, $cookie_value ) = split /=/, $cookie, 2;
$mech->cookie_jar->set_cookie( 0, $cookie_name, $cookie_value, '/',
'.facebook.com', undef, 1, 1, undef, 1 );
}
$mech->get("https://www.facebook.com/messages/");
$mech->save_content("$Bin/Messages_test.htm", binmode => ':raw' );
I want to print the redirected url in perl.
Input url : http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv
output url : http://www.snapdeal.com/product/vox-2-in-1-camcorder/1154987704?utm_source=aff_prog&utm_campaign=afts&offer_id=17&aff_id=1298&source=pricecheckindia
use LWP::UserAgent qw();
use CGI qw(:all);
print header();
my ($url) = "http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv";
my $ua = LWP::UserAgent->new;
my $req = new HTTP::Request(GET => $url);
my $res = $ua->request($req);
print $res->request;
How to get this done in perl?
You need to examine the HTTP response to find the URL. The documentation of HTTP::Response gives full details of how to do this, but to summarise, you should do the following:
use strict;
use warnings;
use feature ':5.10'; # enables "say"
use LWP::UserAgent;
my $url = "http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv";
my $ua = LWP::UserAgent->new;
my $req = new HTTP::Request(GET => $url);
my $res = $ua->request($req);
# you should add a check to ensure the response was actually successful:
if (! $res->is_success) {
say "GET failed! " . $res->status_line;
}
# show the base URI for the response:
say "Base URI: " . $res->base;
You can view redirects using HTTP::Response's redirects method:
if ($res->redirects) { # are there any redirects?
my #redirects = $res->redirects;
say join(", ", #redirects);
}
else {
say "No redirects.";
}
In this case, the base URI is the same as $url, and if you examine the contents of the page, you can see why.
# print out the contents of the response:
say $res->decoded_contents;
Right near the bottom of the page, there is the following code:
$(window).load(function() {
window.setTimeout(function() {
window.location = "http://www.snapdeal.com/product/vox-2-in-1-camcorder/1154987704?utm_source=aff_prog&utm_campaign=afts&offer_id=17&aff_id=1298&source=pricecheckindia"
}, 300);
});
The redirect is handled by javascript, and so is not picked up by LWP::UserAgent. If you want to get this URL, you will need to extract it from the response contents (or use a different client that supports javascript).
On a different note, your script starts off like this:
use LWP::UserAgent qw();
The code following the module name, qw(), is used to import particular subroutines into your script so that you can use them by name (instead of having to refer to the module name and the subroutine name). If the qw() is empty, it's not doing anything, so you can just omit it.
To have LWP::UserAgent follow redirects, just set the max_redirects option:
use strict;
use warnings;
use LWP::UserAgent qw();
my $url = "http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv";
my $ua = LWP::UserAgent->new( max_redirect => 5 );
my $res = $ua->get($url);
if ( $res->is_success ) {
print $res->decoded_content; # or whatever
} else {
die $res->status_line;
}
However, that website is using a JavaScript redirect.
$(window).load(function() {
window.setTimeout(function() {
window.location = "http://www.snapdeal.com/product/vox-2-in-1-camcorder/1154987704?utm_source=aff_prog&utm_campaign=afts&offer_id=17&aff_id=1298&source=pricecheckindia"
}, 300);
});
This will not work unless you use a framework that enables JavaScript, like WWW::Mechanize::Firefox.
It will throw you an error for the last line $res - > request since it is returning hash and content from the response. So below is the code:
use LWP::UserAgent qw();
use CGI qw(:all);
print header();
my ($url) = "http://pricecheckindia.com/go/store/snapdeal/52517?ref=velusliv";
my $ua = LWP::UserAgent->new;
my $req = new HTTP::Request(GET => $url);
my $res = $ua->request($req);
print $res->content;
I am attempting to request a token from https://launchpad.net, according to the docs all it wants is a POST to /+request-token with the form encoded values of oauth_consumer_key, oauth_signature, and oauth_signature_method. Providing those items via curl works as expected:
curl --data "oauth_consumer_key=test-app&oauth_signature=%26&oauth_signature_method=PLAINTEXT" https://launchpad.net/+request-token
However, when i attempt to do it through my perl script it is giving me a 401 unauthorized error.
#!/usr/bin/env perl
use strict;
use YAML qw(DumpFile);
use Log::Log4perl qw(:easy);
use LWP::UserAgent;
use Net::OAuth;
$Net::OAuth::PROTOCOL_VERSION = Net::OAuth::PROTOCOL_VERSION_1_0A;
use HTTP::Request::Common;
use Data::Dumper;
use Browser::Open qw(open_browser);
my $ua = LWP::UserAgent->new;
my ($home) = glob '~';
my $cfg = "$home/.lp-auth.yml";
my $access_token_url = q[https://launchpad.net/+access-token];
my $authorize_path = q[https://launchpad.net/+authorize-token];
sub consumer_key { 'lp-ua-browser' }
sub request_url {"https://launchpad.net/+request-token"}
my $request = Net::OAuth->request('consumer')->new(
consumer_key => consumer_key(),
consumer_secret => '',
request_url => request_url(),
request_method => 'POST',
signature_method => 'PLAINTEXT',
timestamp => time,
nonce => nonce(),
);
$request->sign;
print $request->to_url;
my $res = $ua->request(POST $request->to_url, Content $request->to_post_body);
my $token;
my $token_secret;
print Dumper($res);
if ($res->is_success) {
my $response =
Net::OAuth->response('request token')->from_post_body($res->content);
$token = $response->token;
$token_secret = $response->token_secret;
print "request token ", $token, "\n";
print "request token secret", $token_secret, "\n";
open_browser($authorize_path . "?oauth_token=" . $token);
}
else {
die "something broke ($!)";
}
I tried both with $request->sign and without it as i dont think that is required during the request token phase. Anyway any help with this would be appreciated.
Update, switched to LWP::UserAgent and had to pass in both POST and Content :
my $res = $ua->request(POST $request->to_url, Content $request->to_post_body);
Thanks
Sorry I'm not able to verify from my tablet but with recent Perl you should install and use
use LWP::Protocol::https;
http://blogs.perl.org/users/brian_d_foy/2011/07/now-you-need-lwpprotocolhttps.html