perl WWW::Mechanize, link redirect problem - perl

I use WWW::Mechanize::Shell to test stuff.
my code is this:
#!/usr/bin/perl
use WWW::Mechanize;
use HTTP::Cookies;
my $url = "http://mysite/app/login.jsp";
my $username = "username";
my $password = "asdfasdf";
my $mech = WWW::Mechanize->new();
$mech->cookie_jar(HTTP::Cookies->new());
$mech->get($url);
$mech->form_number(1);
$mech->field(j_username => $username);
$mech->field(j_password => $password);
$mech->click();
$mech->follow_link(text => "LINK A", n => 1);
$mech->follow_link(text => "LINK B", n => 1);
........................
........................
........................
etc, etc.
the problem is the next:
LINK B (web_page_b.html), make a redirect to web_page_x.html
if I print the contents of $mech->content(), display web_page_b.html
but i need to display web_page_x.html,to automatically submit a HTML form (web_page_x.html)
The question is:
How I can get web_page_x.html ?
thanks

Why don't you first test to see if the code containing the redirect (I'm guessing it's a <META> tag?) exists on web_page_b.html, then go directly to the next page once you're sure that that's what a browser would have done.
This would look something like:
$mech->follow_link(text => "LINK B", n => 1);
unless($mech->content() =~ /<meta http-equiv="refresh" content="5;url=(.*?)">/i) {
die("Test failed: web_page_b.html does not contain META refresh tag!");
}
my $expected_redirect = $1; # This should be 'web_page_x.html'
$mech->get($expected_redirect); # You might need to add the server name into this URL
Incidentally, if you're doing any kind of testing with WWW::Mechanize, you should really check out Test::WWW::Mechanize and the other Perl testing modules! They make life a lot easier.

In case it doesn't really redirect, then you better use regex with that follow_link method rather than just plain text.
such as:
$mech->follow_link(url_regex => qr/web_page_b/i , n => 1);
same for the other link.

Related

Used WWW:Mechanize to login and download a file in perl script but cannot get into the actual page content

I am trying to login to the site using Mechanize in Perl script. It does log into the index page but I try to get the content of the page where file hyperlinks are present then I am not logged in I assume that after looking at the content of the page
Secondly, How can we download a file from the indirect hyperlink of a file which directs to some action which then downloads a file. Please guide, as there is not much help available for this particular use case
Below is my code
use WWW::Mechanize;
use LWP;
my $username = 'user';
my $password = 'pass';
my $mech = WWW::Mechanize->new();
$mech -> cookie_jar(HTTP::Cookies->new());
$mech -> get('some_website/index.html?#');
$mech -> form_name('form');
$mech -> field ('user' => $username);
$mech -> field ('password' => $password);
$mech->submit();
$mech -> get('actual_page_address_with_file_hyperlinks?adfgenDispatchAction=examine&idProgressivo=0&idFlusso=4200212');
#print $mech-> content();
$mech->save_content("result.html");
my #links = $mech->links();
for my $link ( #links ) {
#if(index($link->text,"Scarica")!= -1)
#{
printf "%s, %s\n", $link->text, $link->url;
#}
}
;
As the website was using CSRF token and there was no hidden tag defined in the login page to store CSRF token, so token generation in the script was not helping.
I used WWW:: Selenium and I was able to log in to the site and download the files. Issue resolved –

WWW::Mechanize follows a second redirect after six seconds

I am using Perl with the WWW::Mechanize module to submit a form to a webpage and save the result to a file. I know how to submit forms and save the data, but I can't save data after this six-second redirection.
After the form is submitted, the page is redirected to a page that says
Results should appear in this window in approximately 6 seconds...
and it is redirected again to the page with the result I want. My script can follow the first redirection, but not the second, and there is no link says something like "click here if not redirected".
Here is my script
use WWW::Mechanize;
my $mech = WWW::Mechanize->new(autocheck => 1);
$mech->get( "http://tempest.wellesley.edu/~btjaden/TargetRNA2/index.html");
$result = $mech->submit_form(
form_number => 1,
fields => {
text => 'Escherichia coli str. K-12 substr. MG1655',
sequence => '>RyhB' . "\n" .
'GCGATCAGGAAGACCCTCGCGGAGAACCTGAAAGCACGACATTGCTCACATTGCTTCCAGTATTACTTAGCCAGCCGGGTGCTGGCTTTT',
}
);
$mech->save_content(result);
What you need to do is extract the redirect URL and ran it manually:
Try this:
use WWW::Mechanize;
my $mech = WWW::Mechanize->new( autocheck => 1 );
$mech->get( "http://tempest.wellesley.edu/~btjaden/TargetRNA2/index.html");
$result = $mech->submit_form(
form_number => 1,
fields =>
{
text => 'Escherichia coli str. K-12 substr. MG1655',
sequence => '>RyhB GCGATCAGGAAGACCCTCGCGGAGAACCTGAAAGCACGACATTGCTCACATTGCTTCCAGTATTACTTAGCCAGCCGGGTGCTGGCTTTT',
}
);
my $content = $mech->content;
my $url1 = 'http://tempest.wellesley.edu/~btjaden/cgi-bin/';
my ($url2) = $content =~ /URL=(targetRNA2\.cgi?.+)?">/;
$mech->get($url1.$url2);
$mech->save_content(result);
WWW::Mechanize and meta refresh
Does the "6 seconds" contain something line the line below? [You may use save_content method of WWW::Machenize to save page to file]
<meta http-equiv="refresh" content="5; url=http://example.com/">
YES=>
Take a look at sources of WWW::Mechanize::Plugin::FollowMetaRedirect.
It shows how WWW::Mechanize may follow meta refresh with redirect.
It may quite likely solve your problem.

Perl mechanize script no form defined

I'm getting an error No form defined at cqSubmitter.pl at line 33 which is the second set_fields method. Other times I get an Error POSTing http://micron.com Internal Server Error at line 39 , which corresponds to the last click_button line.
I'm not really sure what's going on, and why it's saying no form defined? The first half of the code which includes the first click_button method works fine and saves the correct page, but when I try set_fields for the second time, it errors out.
Anyone familiar with the Mechanize package realize what's going on here?
use Data::Dumper;
use HTTP::Request::Common qw(GET);
use WWW::Mechanize;
#Prepopulated information
my $types_ = "";
my $dept_ = "";
my $group_ = "";
#Create new WWW::Mechanize object
my $mech = WWW::Mechanize->new( 'ssl_opts' => { 'verify_hostname' => 0 } );
my $url = "http://f2prbrequest";
#Fetch URL or Die Tryin'
$mech ->get($url);
$fname = "user";
$pswd = "password";
#Login to ClearQuest form using credentials
$mech->set_fields(
USER => $fname
,PASSWORD => $pswd
);
$mech->click_button(
name => 'Submit'
);
#Set fields and actually fill out ClearQuest Form
$mech->set_fields(
types => $types_
,dept => $dept_
,group => $group_
);
$mech->click_button(
name => 'submit1'
);
$mech->save_content("clearQuestFilled.html");

WWW::Mechanize get content from page

I've used WWW::Mechanize to login to the site.
Now that we are logged in, I want to make WWW::Mechanize script go to payments.php and then find the active user subscription (for example VIP Access) (class: <p class="description">).
From this I want to then read what that is, then select the correct action. For example if users package states VIP Small then print PKG: VIP Small and if users package states VIP Full then print PKG: VIP Full.
Does anyone know of a way to do this? Code used so far (being coded in my Ubuntu virtual machine):
#!/usr/bin/perl
use WWW::Mechanize;
my $forum = "http://localhost/forums/forum.php";
print "Username\r\n";
my $username = <>;
chomp($username);
print "Password\r\n";
my $password = <>;
# do login
my $mech = WWW::Mechanize->new(agentcheck => 1, agent => 'Perl WWW::Mechanize');
$mech->get($forum);
$mech->submit_form(form_number => 1, fields => { vb_login_username => $username, vb_login_password = $password });
print "this far";
$mech->follow_link(text => "Click here if your browser does not automatically redirect you.");
I think you need
$mech->get('http://localhost/forums/payments.php');
but I cannot help you get information from there without seeing the HTML of the page.
You need to parse result HTML file. I recommend to use HTML::TreeBuilder::XPath for such tasks:
my $tree = HTML::TreeBuilder::XPath->new_from_content( $mech->content() );
my ($description) = $tree->findvalues('//p[ #class = "description" ]');

How can I log in to YouTube using Perl?

I am trying to write a Perl script to connect to me YouTube account but it doesnt seem to work. Basically I just want to connect to my account but apparently it is not working. I don't even have an idea on how I could debug this! Maybe it is something related to https protocol?
Please enlighten me! Thanks in advance.
use HTTP::Request::Common;
use LWP::UserAgent;
use strict;
my $login="test";
my $pass = "test";
my $res = "";
my $ua = "";
# Create user agent, make it look like FireFox and store cookies
$ua = LWP::UserAgent->new;
$ua->agent("Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20051213 Firefox/1.0.7");
$ua->cookie_jar ( {} );
# Request login page
$res = $ua->request(GET "https://www.google.com/accounts/ServiceLogin?service=youtube&hl=en_US&passive=true&ltmpl=sso&uilel=3&continue=http%3A//www.youtube.com/signup%3Fhl%3Den_US%26warned%3D%26nomobiletemp%3D1%26next%3D/index");
die("ERROR1: GET http://www.youtube.com/login\n") unless ($res->is_success);
# Now we login with our user/pass
$res = $ua->request(
POST "https://www.google.com/accounts/ServiceLoginAuth?service=youtube",
Referer => "http://www.youtube.com/login",
Content_Type => "application/x-www-form-urlencoded",
Content => [
currentform => "login",
next => "/index",
username => $login,
password => $pass,
action_login => "Log+In"
]
);
# YouTube redirects (302) to a new page when login is success
# and returns OK (200) if the login failed.
#die("ERROR: Login Failed\n") unless ($res->is_redirect());
print $res->content;
what i am doing is learning the web features of perl, so i dont want to use any library except wwwlib or mechanize to get the job done.
how can i just connect to my account using a perl script? this is my objective for now
hope someone can post a script or correct mine.
thank you guys for you help.
i am testing Webscarab now..
What data are you trying to grab? Why not just using an existing implementation like WebService::YouTube
Some comments on your code: I always avoided the shortcut $ua->request(GET/POST) method since I always ended up needing more flexibility that only the use of HTTP::Request and HTTP::Response allowed. I always felt the code was cleaner that way too.
Why is your code not working? Who knows.
Make sure your cookiejar is adding your cookies to the outgoing HTTP::Request. I'd suggest dumping all your headers when you do it in a browser and compare with the headers and data that libwww is sending. There may be some additional fields that they are checking for that vary for every hit. They may be checking for your UserAgent string. If you are just looking to learn libwww I'd suggest using a different site as a target as I'm sure YouTube has all sort of anti-scripting hardening.
Are you using YouTube's stable documented API?
Use an HTTP proxy such as WebScarab to watch the data flow.
Trey's suggestion to use somebody else's CPAN package for the mechanics is a good idea too.
Right right by and large, what you want to do is define a cookiejar for most of these websites that have a redirection login. This is what the package has done. Also the package tunes a lot of the lookups and scrapes based on the youtube spec.
Ajax content for example will be rough since its not there when your scraping
You just picked a somewhat rough page to start out with.
Enjoy
I'm actually working on this issue myself. Before, I would suggest read over this the API guide from Google as a good starting reference. If I'm reading it correctly, one begins with passing user credentials through a REST interface to get a Authentication Token. To handle that, I'm using the following:
sub getToken {
my %parms = #_;
my $response = LWP::UserAgent->new->post(
'https://www.google.com/youtube/accounts/ClientLogin',
[
Email => $parms{'username'},
Passwd => $parms{'password'},
service => "youtube",
source => "<<Your Value Here>>",
]
);
my $content = $response->content;
my ($auth) = $content =~ /^Auth=(.*)YouTubeUser(.*)$/msg
or die "Unable to authenticate?\n";
my ($user) = $content =~ /YouTubeUser=(.*)$/msg
or die "Could not extract user name from response string. ";
return ($auth, $user);
}
And I call that from the main part of my program as such:
## Get $AuthToken
my ($AuthToken, $GoogleUserName) = getToken((
username => $email, password => $password
));
Once I have these two things -- $AuthToken and $GoogleUserName, I'm still testing the LWP Post. I'm still writing this unit:
sub test {
my %parms = #_;
## Copy file contents. Use, foy's three param open method.
my $fileSize = -s $parms{'File'};
open(VideoFile, '<', "$parms{'File'}") or die "Can't open $parms{'File'}.";
binmode VideoFile;
read(VideoFile, my $fileContents, $fileSize) or die "Can't read $parms{'File'}";
close VideoFile;
my $r = LWP::UserAgent->new->post(
"http://uploads.gdata.youtube.com/feeds/api/users/$parms{'user'}/uploads",
[
Host => "uploads.gdata.youtube.com",
'Authorization' => "AuthSub token=\"$parms{'auth'}\"",
'GData-Version' => "2",
'X-GData-Key' => "key=$YouTubeDeveloperKey",
'Slug' => "$parms{'File'}",
'Content-Type' => "multipart/related; boundary=\"<boundary_string>\"",
'Content-Length' => "<content_length>",
'video_content_type'=> "video/wmv",
'Connection' => "close",
'Content' => $fileContents
]
);
print Dumper(\$r->content)
}
And that is called as
&test((auth=>$Auth, user=>$user, File=>'test.wmv'));