Automatic Search Using WWW::Mechanize - perl

I am trying to write a Perl script which will automatically key in search variables on this LexisNexis search page and retrieve the search results.
I am using the WWW::Mechanize module but I am not sure how to figure out the field name of the search bar itself. This is the script I have so far ->
#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
my $m = WWW::Mechanize->new();
my $url = "http://www.lexisnexis.com/hottopics/lnacademic/?verb=sr&csi=379740";
$m->get($url);
$m->form_name('f');
$m->field('q', 'Test');
my $response = $m->submit();
print $response->content();
However, I think the "Name" of the search box in this website is not "q". I am getting the following Error - "Can't call method "value" on an undefined value at site/lib/WWW/Mechanize.pm line 1442." Any help is much appreciated. Thank you !

If you disable the JavaScript in your browser then you will notice that the search form doesn't load which means it's being loaded by JavaScript, that's why you are unable to handle it with WWW::Mechanize. Have a look at WWW::Mechanize::Firefox, this might help you with your task. Check out the example scripts, cookbook and FAQs.
You can also do the same using Selenium, see Gabor's tutorial on Selenium.

Related

Fetch a URL 100 times using Perl

The problem I met is that I need to get one URL (I cannot be specific that link exactly, this link is doing request and looks like http://link.com/?name=name&password=password& and etc)
And I need to fetch this URL 100 times in a row. I can not do this manually using browser - this takes much time.
Is there any option to run (just run, like you put link in browser and press enter) this link 100 times in a row using Perl scripting?
I have not met before with the Perl and therefore asking the help directly. As I google before some information and make a little script, but seems like I missing something in my knowledge:
#!/usr/bin/perl -w
use LWP::Simple;
my $uri = 'http://my link here';
my $content = get $uri;
Could you please advise to me how I can finish this script?
Use a (simple) for loop.
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my $uri = 'http://my link here';
get $uri for 1 .. 100;
Update: Just read in a comment that you don't care about the returned data, so I've edited my answer to remove the unnecessary $content variable.

Can't call method "text" on an undefined value at sample.pl line 11

use strict; # safety net
use warnings; # safety net
use feature 'say'; # a better "print"
use Mojo;
my $dom = Mojo::DOM->new;
my $ua = Mojo::UserAgent->new;
$dom= $ua->get('http://search.cpan.org/faq.html')->res->dom;
my $desc=$dom->at('#u')->text;
when I run this code the above error has occur . This is my input data form following web page pls refer this
I want output like this only answers.
CPAN Search is a search engine for the distributions, modules, docs, and ID's on CPAN. It was conceived and built by Graham Barr as a way to make things easier to navigate. Originally named TUCS [ The Ultimate CPAN Search ] it was later named CPAN Search or Search DOT CPAN.
If you are having technical difficulties with the site itself, send mail to cpansearch#perl.org and try to be as detailed as possible in your note describing the problem.
pls anyone can help me
Something like this:
perl -Mojo -le 'print r g("http://search.cpan.org/faq.html")->dom("#cpansearch > div.pod > p")->map("text")->to_array;'

WWW::Mechanize Extraction Help - PERL

I'm try to automate the extraction of a transcript found on a website. The entire transcript is found between dl tags since the site formatted the interview in a description list. The script I have below allows me to search the site and extract the text in a plain-text format, but I'm actually looking for it to include everything between the dl tags, meaning dd's, dt's, etc. This will allow us to develop our own CSS for the interview.
Something to note about the page is that there are break statements inserted at various points during the interview. Some tools we've found that extract information from webpages using pairings have found this to be a problem since it only grabs the information up until the break statement. Just something to keep in mind if you point me in a different direction. Here's what I have so far.
#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;
use WWW::Mechanize::TreeBuilder;
my $mech = WWW::Mechanize->new();
WWW::Mechanize::TreeBuilder->meta->apply($mech);
$mech->get("http://millercenter.org/president/clinton/oralhistory/madeleine-k-albright");
# find all <dl> tags
my #list = $mech->find('dl');
foreach ( #list ) {
print $_->as_text();
}
If there is a tool that essentially prints what I have, only this time as HTML, please let me know of it!
Your code is fine, just change the as_text() method to as_HTML() and it will show the content with HTML tags included.

Perl LWP::Simple won't GET some URLs

I am trying to write a basic webscraping program in Perl. For some reason it is not working correctly and I don't have the slightest clue as to why.
Just the first part of my code where I am getting the content (just saving all of the HTML code from the webpage to a variable) does not work with certain websites.
I am testing it by just printing it out, and it does not print anything out with this specific website. It works with some other sites, but not all.
Is there another way of doing this that will work?
#use strict;
use LWP::Simple qw/get/;
use LWP::Simple qw/getstore/;
## Grab a Web page, and throw the content in a Perl variable.
my $content = get("https://jobscout.lhh.com/Portal/Page/ResumeProfile.aspx?Mode=View&ResumeId=53650");
print $content;
You have a badly-written web site there. The request times out with a 500 Internal Server Error.
I can't suggest how to get around it, but the site almost certainly uses JavaScript as well which LWP doesn't support, so I doubt if an answer would be much use to you.
Update
It looks like the site has been written so that it goes crazy if there is no Accept-Language header in the request.
The full LWP::UserAgent module is necessary to set it up, like this
use strict;
use warnings;
use LWP;
my $ua = LWP::UserAgent->new(timeout => 10);
my $url = 'https://jobscout.lhh.com/Portal/Page/ResumeProfile.aspx?Mode=View&ResumeId=53650';
my $resp = $ua->get($url, accept_language => 'en-gb,en', );
print $resp->status_line, "\n\n";
print $resp->decoded_content;
This returns with a status of 200 OK and some HTML.
To interact with a website that uses Javascript, I would advise that you use the following module:WWW::Mechanize::Firefox
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $url = "https://jobscout.lhh.com/Portal/Page/ResumeProfile.aspx?Mode=View&ResumeId=53650"
my $mech = WWW::Mechanize::Firefox->new();
$mech->get($url);
print $mech->status();
my $content = $mech->content();

Save a pdf file that's been opened in Internet Explorer with OLE and Perl

I am looking for a way to use Perl to open a PDF file in Internet Explorer and then save it.
(I want the user to be able to interact with the script and decide whether downloading occurs, which is why I want to pdf to be displayed in IE, so I cannot use something like LWP::Simple.)
As an example, this code loads (displays) a pdf, but I can't figure out how to get Perl to tell IE to save the file.
use Win32::OLE;
my $ie = Win32::OLE->new("InternetExplorer.Application");
$ie->{Visible} = 1;
Win32::OLE->WithEvents($ie);
$ie->Navigate('http://www.aeaweb.org/Annual_Meeting/pdfs/2014_Registration.pdf');
I think I might need to use the OLE method execWB, but I haven't been able to figure it out.
What you want to do is automate the Internet Explorer UI. There are many libraries out there that will do this. You tell the library to find your window of interest, and then you can send keystrokes or commands to the window (CTRL-S in your case).
A good overview on how to do this in Perl is located here.
Example syntax:
my #keys = ( "%{F}", "{RIGHT}", "E", );
for my $key (#keys) {
SendKeys( $key, $pause_between_keypress );
}
The code starts with an array containing the keypresses. Note the
format of the first three elements. The keypresses are: Alt+F, right
arrow, and E. With the application open, this navigates the menu in
order to open the editor.
Another option is to use LWP:
use LWP::Simple;
my $url = 'http://www.aeaweb.org/Annual_Meeting/pdfs/2014_Registration.pdf';
my $file = '2014_Registration.pdf';
getstore($url, $file);
ForExecWB here is good thread, however it is not solved: http://www.perlmonks.org/?node_id=477361
$IE->ExecWB($OLECMDID_SAVEAS, $OLECMDEXECOPT_DONTPROMPTUSER,
$Target);
Why don't you display the PDF in IE then close the IE and save the file using LWP?
You could use Selenium and the perl remote drivers to manage IE
http://search.cpan.org/~aivaturi/Selenium-Remote-Driver-0.15/lib/Selenium/Remote/Driver.pm
http://docs.seleniumhq.org/projects/webdriver/
You will also need to download the IE selenium driver - it comes with firefox as standard
https://code.google.com/p/selenium/wiki/InternetExplorerDriver
use Selenium::Remote::Driver;
my $driver = new Selenium::Remote::Driver;
$driver->get('http://www.google.com');
print $driver->get_title();
$driver->quit();