WWW::Mechanize Perl - perl

I am writing a simple code to login to a website for learning purpose.
I get an error saying "No Form Defined"
How do I know the form name?
Below is the code snippet (I found it from this forum).
use strict;
use warnings;
use WWW::Mechanize;
use HTTP::Cookies;
my $mech = WWW::Mechanize->new();
my $url = "http://www.something.net";
$mech->cookie_jar->set_cookie(0,"start",1,"/",".something.net");
$mech->get($url);
$mech->form_name("frmLogin");
$mech->set_fields(user=>'user',passwrd=>'password');
$mech->click();
$mech->save_content("logged_in.html");
Does the code look alright?

The name of the form(s), if any, are embedded in the content that you are retrieving. If you view the source for this page, for example, you will find many form elements. This one has the id add-comment-44827103:
<form id="add-comment-44827103"
class=""
data-placeholdertext="Use comments to ask for more information or suggest improvements. Avoid answering questions in comments."></form>
You can retrieve them with $mech->forms. This call returns a list of HTML::Form objects that you can interrogate further.
my ($form) = $mech->forms; # note ($var)=... for list context
my $form_id = $form->attr("id") || die "form on page doesn't have 'id' attr";
$mech->form_id($form_id);
...
There is also the $mech->form_number( index ) call
$mech->form_number(2); # select the 2nd form on the page

Related

Getting different source code for same url

I'm trying to grab the headline from a Washington Post news page on the web with a simple Perl script:
#! /usr/bin/env perl
use strict;
use warnings;
use LWP::Simple;
use Web::Scraper;
my $url = 'https://www.washingtonpost.com/outlook/why-trump-is-flirting-with-abandoning-fox-news-for-one-america/2019/10/11/785fa156-eba4-11e9-85c0-85a098e47b37_story.html';
my $scraper = scraper{
process '//h1[#data-qa="headline"]', 'headline' => 'TEXT',
};
my $html = get($url);
print $html;
my $res = $scraper->scrape ($html);
The problem I'm having is that it works only about 1/2 of the time even when fetching the exact same URL. The source code that is returned is in a completely different format than other times.
Perhaps this is an anti-scraping measure for unknown agents? I'm not sure but it seems like it should never work at all if that was the case.
Is there a simple workaround I might employ like accepting cookies?
Modified $scraper to the following to get it to work with the different source code:
my $scraper = scraper {
process '//h1[#data-qa="headline"]', 'headline' => 'TEXT',
process '//h1[#itemprop="headline"]', 'headline2' => 'TEXT',
};
Either headline or headline will be populated.

Forms and POST using perl

I have this which is supposed to fill the email form on the http://faceoook.com/recover.php
and as you know, you can search by email,name or phone number.
So I am trying to search by email, and get the content of that page after the search has been completed to see whether the profile is found or not, but the code doesn't seem to work.
use HTTP::Request::Common;
use LWP::UserAgent;
$email="blabla\#hotmail.com";
my %data=(email=>$email);
my $user_agent = 'Mozilla/6.0';
my $Browser = LWP::UserAgent->new;
$Browser->agent($user_agent);
$ua=$Browser->post('https://www.facebook.com/recover.php',\%data);
if($ua->content=~/couldn\'t/){ #"couldn't" is part of the message displayed when
print "Not Found"; # input doesn't match
}
elsif ($ua->content=~/name/) {
print "Found";
}
else {
print "Not found";
}
$result=$ua->content;
open FILE,">","me.txt" or die $!;
print FILE $result;
close FILE;
use strict
make it compile under strict
review the manpage for LWP::UserAgent, there's a problem with your code that you'll have to discover on your own so you'll remember
review your variable names in light of the conventions used in the manpage
review the approach (considering the Facebook has an API, IIRC)
no need to escape the single quote in the regex
You should post your request to the URL in the action field of the form (while you're using the URL of the page which shows the form).
Also add any hidden field to your %data.
Have a look at the HTML code of the page (or use some sort of form inspector) to get the correct URL and the hidden fields (javascript code, if present, can further complicate things).
Then use strict (and use warnings as well) as already said by Lumi.

What is the reason for the error message `Can't locate object method "get_ok"` when using WWW::Mechanize::TreeBuilder?

I couldn't really figure out how to use WWW::Mechanize::TreeBuilder. Basically I get a HTML page using WWW::Mechanize. There is a //div[#class='cars'] whose text I want to extract.
I tried:
my $mech = WWW::Mechanize->new();
$mech->get('the url');
WWW::Mechanize::TreeBuilder->meta->apply($mech);
$mech->get_ok('//div[#class="cars"]');
print $mech->look_down(_tag => 'p')->as_trimmed_text . "\n";
It says:
Can't locate object method "get_ok" via package "Class::MOP::Class::__ANON__::SERIAL::2" at orpi_crawler.pl
get_ok is from Test::WWW::Mechanize which you neglected to load. Read the synopsis of WWW::Mechanize::TreeBuilder carefully.

WWW:Mechanize Form Select

I am attempting to login to Youtube with WWW:Mechanize and use forms() to print out all the forms on the page after logging in. My script is logging in successfully, and also successfully navigating to Youtube.com/inbox; However, for some reason Mechanize can not see any forms at Youtube.com/inbox. It just returns blank. Here is my code:
#!"C:\Perl64\bin\perl.exe" -T
use strict;
use warnings;
use CGI;
use CGI::Carp qw/fatalsToBrowser/;
use WWW::Mechanize;
use Data::Dumper;
my $q = CGI->new;
$q->header();
my $url = 'https://www.google.com/accounts/ServiceLogin?uilel=3&service=youtube&passive=true&continue=http://www.youtube.com/signin%3Faction_handle_signin%3Dtrue%26nomobiletemp%3D1%26hl%3Den_US%26next%3D%252Findex&hl=en_US&ltmpl=sso';
my $mechanize = WWW::Mechanize->new(autocheck => 1);
$mechanize->agent_alias( 'Windows Mozilla' );
$mechanize->get($url);
$mechanize->submit_form(
form_id => 'gaia_loginform',
fields => { Email => 'myemail',Passwd => 'mypassword' },
);
die unless ($mechanize->success);
$url = 'http://www.youtube.com/inbox';
$mechanize->get($url);
$mechanize->form_id('comeposeform');
my $page = $mechanize->content();
print Dumper($mechanize->forms());
Mechanize is unable to see any forms at youtube.com/inbox, however, like I said, I can print all of the forms from the initial link, no matter what I change it to...
Thanks in advance.
As always, one of the best debugging approaches is to print what you get and check if it is what you were expecting. This applies to your problem too.
In your case, if you print $mechanize->content() you'll see that you didn't get the page you're expecting. YouTube wants you to follow a JavaScript redirect in order to complete your cross-domain login action. You have multiple options here:
parse the returned content manually – i.e. /location\.replace\("(.+?)"/
try to have your code parse JavaScript (have a look at WWW::Scripter)
[recommended] use YouTube API for managing your inbox

How can I access forms without a name or id with Perl's WWW::Mechanize?

I am having problems with my Perl program. This program logs in to a specific web page and fills up the text area for the message and an input box for mobile numbers. Upon clicking the 'Send' button, the message will be sent to the specified number. I already got it to work for sending messages. But the problem is I can't make it work for receiving messages/replies. I'm using WWW::Mechanize module in Perl. Here is a part of my code (for receiving msgs):
$username = 'suezy';
$password = '123';
$url = 'http://..sample.cgi';
# ...
$mech->credentials($username, $password);
$mech->get($url);
$mech->submit();
My problem is, the forms shows no names. There are two buttons in this form, but I can't select which button to click, since there are no name specified and the ids contains a space(e.g. form name='receive msg'..). I need to click on the second button, 'Receive'.
Question is, how will I be able to access the forms and buttons using mechanize module without using names?
You can pass a form_number argument to the submit_form method.
Or call the form_number method to affect which form is used by later calls to click or field.
Have you tried to use HTTP Recorder?
Have a look at the documentation and try it to see if it gives a reasonable result for you.
Seeing that there are only two buttons on your form, ysth's suggestion should be easy to implement.
use strict;
use warnings;
use WWW::Mechanize;
my $username = "suezy";
my $password = "123";
my $url = 'http://.../sample.cgi';
my $mech = WWW::Mechanize->new();
$mech->get($url);
$mech->credentials($username,$password);
And then:
$mech->click_button({number => 1}); # if the 'Receive' button is 1
Or:
$mech->click_button({number => 2}); # if the 'Receive' button is 2
A case of trial-and-error is more than adequate for you to figure out which button you're clicking.
EDIT
I'm assuming that the relevant form has already been selected. If not:
$mech->form_number($formNumber);
where $formNumber is the form number on the page in question.
$mech->form_with_fields('username');
will select the form that contain a field named username.
hth