I'm trying to crawl this page using Perl LWP:
http://livingsocial.com/cities/86/deals/138811-hour-long-photo-session-cd-and-more
I had code that used to be able to handle living social, but it seems to have stopped working. Basically the idea was to crawl the page once, get its cookie, set the cookie in the UserAgent, and crawl it twice more. By doing this, you could get through the welcome page:
$response = $browser->get($url);
$cookie_jar->extract_cookies($response);
$browser->cookie_jar($cookie_jar);
$response = $browser->get($url);
$response = $browser->get($url);
This seems to have stopped working for normal LivingSocial pages, but still seems to work for LivinSocialEscapes. E.g.,:
http://livingsocial.com/escapes/148029-cook-islands-hotel-+-airfare
Any tips on how to get past the welcome page?
It looks like this page only works with a Javascript enabled browser (which LWP::UserAgent is not) You could try WWW::Mechanize::Firefox instead:
use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new();
$mech->get($url);
Note that you must have Firefox and the mozrepl extension installed for this module to work.
Related
Bellow is a script which grabs a cookie from "example.com" and encodes it base64,
It usually works, although for some reason I will have random days where it acts up and does not grab any cookies at all.
I've checked myself at times where the script was failing, and the site would still send a cookie to the client.
Same method, nothing changed on the sites behalf, and nothing would change on the script, though it would still just act up some times.
Anyone know what this could possibly be?
Do I have to change my method of grabbing cookies incase this method may be obsolete or ancient?
my $ua = new LWP::UserAgent;
$ua->cookie_jar({});
use Data::Dumper;
$ua->get("http://www.example.com");
my $cookie = encode_base64($ua->cookie_jar->as_string);
Other info: It's apart of a perl cgi script, hosted on a website.
I'm still unsure of the error thats causing this, Although, when I had turned off "CloudFlare" for my website, the problem resolved itself immediately.
Oh well there goes 3 hours of my life over CloudFlare...
At my work I build a lot of wordpress sites and I also do a lot of cutting and pasting. In order to streamline this process I'm trying to make a crawler that can fill out and submit form information to wordpress. However, I can't get the crawler to operate correctly in the wordpress admin panel once I'm past the login.
I know it works to submit the login form because I've gotten the page back before. But this script doesn't seem to return the "settings" page, which is what I want. I've been trying to use this site as a guide: www.higherpass.com/Perl/Tutorials/Using-Www-mechanize/3/ for how to use mechanize but I could use some additional pointers for this. Here is my Perl script, I've tried a few variations but I just need to be pointed in the right direction.
Thanks!
use WWW::Mechanize;
my $m = WWW::Mechanize->new();
$url2 = 'http://www.moversbatonrougela.com/wp-admin/options-general.php';
$url = 'http://www.moversbatonrougela.com/wp-admin';
$m->get($url);
$m->form_name('loginform');
$m->set_fields('username' => 'user', 'password' => 'password');
$m->submit();
$response = $m->get($url2);
print $response->decoded_content();
Put the below lines of code just before $m->submit(); . Since WWW::Mechanize is a subclass of LWP::UserAgent you can use any of LWP's methods.
$m->add_handler("request_send", sub { shift->dump; return });
$m->add_handler("response_done", sub { shift->dump; return });
The above would enable logging in your code. Look out for the Request/Response return codes i.e. 200 (OK) or 302 (Redirect) etc. The URL request i.e. the $m->get() is probably getting redirected or the machine's ip is Blocked by the server. If its a redirect, then you can probably use $m->redirect_ok(); to follow the redirect URL, or in case you don't want to follow the redirect URL use $m->requests_redirectable (this is an LWP method). The logs should show something like below-
HTTP/1.1 200 OK
OR
HTTP/1.1 302 Found
If none of the above works, use an alternative of $m->submit(); like below and give it a try-
my $inputobject=$mech->current_form()->find_input( undef, 'submit' );
$m->click_button(input => $inputobject);
I am trying to login to a website using perl. I have tried all the options - LWP::Mechanize, LWP::UserAgent, etc. but still have not been able to login successfully. I get a response code of 200 which means successful but how will i move on to the next page? Any help will be appreciated.
Make sure you are using cookies with LWP
$ua->cookie_jar({ file => "$ENV{HOME}/.cookies.txt" });
after the login just have the $ua request the next page.
If the login redacts you to another page and you want to get that then use
$ua->requests_redirectable
For more info check out the docs at
http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm
Getting the following error in a JavaScript link using perl - WWW::Mechanize.
Error GETing javascript:submt_os('2','contact%20info','contact%20info'):Protocol scheme 'javascript' is not supported
This is my code:
#!/usr/bin/perl
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
$uri="http://tinyurl.com/76xv4ld";
$mech->get($uri);
# error on this link
$mech->follow_link( text => 'Contact Information');
print $mech->content();
Once I get the page, I want to click Contact Information.
Is there any other way to click Contact Information?
You can't follow a javascript link with WWW::Mechanize. Even if you had a javascript interpreter you'd need complete DOM support for anything non-trivial.
So - you need to script a web-browser. I use Selenium in my testing, which is quite bulky and requires java. You might want to investigate WWW::Mechanize::Firefox. I've not used it but it does provide a mechanize style interface to Firefox.
Can I get some help on how to submit a POST with the necessary variables using WWW::Mechanize::Firefox? I've installed all the perl modules, and the firefox plugin and tested such that I can connect to a give host and get responses... my questions is how to submit a POST request. On the documentation Corion says he may never implement. This seems odd, I'm hoping I can use the inherited nature from Mechanize, but can't find any examples. A simple example would help me tremendously.
my $mech = WWW::Mechanize::Firefox->new();
$mech->allow( javascript =>1); # enable javascript
# http
$mech->get("http://www.example.com");
my $c = $mech->content;
Is there a mech->post() option I am simply missing?
many thanks in advance.
R
Normally you would just set the fields and submit the form like this:
$mech->get('http://www.website.com');
$mech->submit_form(
with_fields => {
user => 'me',
pass => 'secret',
}
);
get a page, fill out a form, submit the form
if you're going to be skipping the above steps by using post, you don't need mechanize firefox