Perl script to automate a website for bioinformatics - perl

I would like to automate this website with a Perl script
http://bioinfo.uni-plovdiv.bg/microinspector/
This is what I have so far and I am not sure how to get to the output page after this, I know it has something to do with POST, redirect_ok?, response(), but I am not sure. I read through the documentation but am confused about some things. Thanks.
use strict;
use warnings;
use WWW::Mechanize;
# create object for browser
my $browser = WWW::Mechanize->new();
my ($sequence, $results);
open (DRG, "<microRNA_target_cspg_drg_output.fa") || die "cannot open microRNA_target_cspg_drg_output.fa";
while (<DRG>) {
chomp;
$sequence=$_;
last; #for testing purposes
}
close (DRG);
$browser->get("http://bioinfo.uni-plovdiv.bg/microinspector/");
$browser->form_number(1);
$browser->field("target_sequence", $sequence);
$browser->field("Choose an organism : ", "Mus musculus");
$browser->click_button( number => 1);

You should start with WWW::Mechanize. It's page provides examples on submitting forms, and anything else you will need.
EDIT: as a reply to your update, if you want to get the content of the page, use the content method, like in this example:
my $content = $browser->content();
See this for more info.

Related

Perl: Open a file from a URL

I wanted to know how to open a file from a URL rather than a local file and I found the following answer on another thread:
use IO::String;
my $handle = IO::String->new(get("google.com"));
my #lines = <$handle>;
close $handle;
This works perfectly... on my PC...
But when I transferred the code over to my hosted server it complains that it can't find the IO module. So is there another way to open a file from an URL, that doesn't require any external modules (or uses one that is pretty much installed on every server)...?
You can install PerlIO::http, which will give you an input layer for opening a filehandle from a URL via open. This thing is not included in the Perl core, but it will work with Perls as early as 5.8.9.
Once you've installed it, all you need to do is open with a layer :http in the mode argument. There is nothing to use here. That happens automatically.
open my $fh, '<:http', 'https://metacpan.org/recent';
You can then read from $fh like a regular file. Under the hood it will take care of getting the data over the wire.
while (my $line = <$fh>) { ... }
There is no way to "open a file from a URL" as you ask. Well, I suppose you could throw something together using the progress() callback from LWP::UserAgent, but even then I don't think it would work how you want it to.
But you can make something that looks like it's doing what you want pretty easily. Actually, what we're really doing is pulling all the data back from the URL and then opening a filehandle on a string that contains that data.
use LWP::Simple;
my $data = get('https://google.com');
open my $url_fh, '<', \$data or die $!;
# Now $url_fh is a filehandle wrapped around your data.
# Treat it like any other filehandle.
while (<$url_fh>) {
print;
}
Your problem was that IO::String wasn't installed. But there's no need to install it, as it's simple enough to do what it does with standard Perl features (simply open a filehandle on a reference to a string).
Update: IO::String is completely unnecessary here. Not only because you can do what it does very simply, by just opening a filehandle on a reference to your string, but also because all you want to do is to read a file from a web site into an array. And in that case, your code is simply:
use LWP::Simple;
my $url = 'something';
my #records = split /\n/, get($url);
You might even consider adding some error handing.
use LWP::Simple;
my $url = 'something';
my $data = get($url);
die "No data found\n" unless defined $data;
my #array = split /\n/, get($url);

get the whole content from the web site using perl script

I am a new hire in my company and first time I am working on Perl.
I get a task in which I find IP-Reputation from this link: https://www.talosintelligence.com/reputation_center/lookup?search=27.34.246.62
But in perl when we use:
#!/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
open FILE1, ">./Reports/Reputation.txt" or die "Cannot open Reputation.txt!";
my $mech = WWW::Mechanize->new( autocheck => 1 );
my $url="https://www.talosintelligence.com/reputation_center/lookup?search=27.34.246.62";
$mech->get($url);
print $mech->status();
my $content = $mech->content();
open FILE1, ">./Reports/Reputation.txt" or die "Cannot open Reputation.txt!";
print FILE1 ($content);
close FILE1;
print "\nIP Reputation Report Generated \n";
I don't get the whole content. What can I do to get this?
Contents are loading from JavaScript. So you can't crawl the content using simple methods.
There is two option for this kind of situation.
1) Some API contains the original data and JavaScript loads/formating the data in front end. If you want to parse the JavaScript loading content try to use the
WWW::Mechanize::Firefox
2) Try to figure out from where it is loading, for your IP following link has the corresponding data, which is JSON formated, so parse the content using JSON module. it is so simple compare to using RegEx.
https://www.talosintelligence.com/sb_api/query_lookup?query=%2Fapi%2Fv2%2Frelated_ips%2Fip%2F&query_entry=27.34.246.62

Perl LWP::Simple won't GET some URLs

I am trying to write a basic webscraping program in Perl. For some reason it is not working correctly and I don't have the slightest clue as to why.
Just the first part of my code where I am getting the content (just saving all of the HTML code from the webpage to a variable) does not work with certain websites.
I am testing it by just printing it out, and it does not print anything out with this specific website. It works with some other sites, but not all.
Is there another way of doing this that will work?
#use strict;
use LWP::Simple qw/get/;
use LWP::Simple qw/getstore/;
## Grab a Web page, and throw the content in a Perl variable.
my $content = get("https://jobscout.lhh.com/Portal/Page/ResumeProfile.aspx?Mode=View&ResumeId=53650");
print $content;
You have a badly-written web site there. The request times out with a 500 Internal Server Error.
I can't suggest how to get around it, but the site almost certainly uses JavaScript as well which LWP doesn't support, so I doubt if an answer would be much use to you.
Update
It looks like the site has been written so that it goes crazy if there is no Accept-Language header in the request.
The full LWP::UserAgent module is necessary to set it up, like this
use strict;
use warnings;
use LWP;
my $ua = LWP::UserAgent->new(timeout => 10);
my $url = 'https://jobscout.lhh.com/Portal/Page/ResumeProfile.aspx?Mode=View&ResumeId=53650';
my $resp = $ua->get($url, accept_language => 'en-gb,en', );
print $resp->status_line, "\n\n";
print $resp->decoded_content;
This returns with a status of 200 OK and some HTML.
To interact with a website that uses Javascript, I would advise that you use the following module:WWW::Mechanize::Firefox
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $url = "https://jobscout.lhh.com/Portal/Page/ResumeProfile.aspx?Mode=View&ResumeId=53650"
my $mech = WWW::Mechanize::Firefox->new();
$mech->get($url);
print $mech->status();
my $content = $mech->content();

Parsing an Internet page line after line

Firstly I'd like to apology - I'm new to Perl, and my question is so basic I am almost sure it had been asked before, but sadly I couldn't find it.
I'd like to parse an Internet page like I parse a text file with the open my $file, "<", "..". That is, I'd like to use a loop: while (my $line = <$file>). Sadly I couldn't find a way to do that; only using LWP::UserAgent with some get's and content, but that gives me the whole Internet page. I could make an array out of it by splitting it with respect to \n, but I really want to use the convenience of <$file>.
What can I do?
Thank you very much and sorry again if it had been asked before.
Here is one way to do this:
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $response = $ua->get('http://search.cpan.org/');
# assume a successful response
open(my $fh, "<", \$response->decoded_content);
while (<$fh>) {
print "line $. has ", length($_), " characters\n";
}
# $fh will close when it goes out of scope.

How can I download link targets from a web site using Perl?

I just made a script to grab links from a website, and in turn saves them into a text file.
Now I'm working on my regexes so it will grab links which contains php?dl= in the url from the text file:
E.g.: www.example.com/site/admin/a_files.php?dl=33931
Its pretty much the address you get when you hover over the dl button on the site. From which you can click to download or "right click save".
I'm just wondering on how to achieve this, having to download the content of the given address which will download a *.txt file. All from the script of course.
Make WWW::Mechanize your new best friend.
Here's why:
It can identify links on a webpage that match a specific regex (/php\?dl=/ in this case)
It can follow those links through the follow_link method
It can get the targets of those links and save them to file
All this without needing to save your wanted links in an intermediate file! Life's sweet when you have the right tool for the job...
Example
use strict;
use warnings;
use WWW::Mechanize;
my $url = 'http://www.example.com/';
my $mech = WWW::Mechanize->new();
$mech->get ( $url );
my #linksOfInterest = $mech->find_all_links ( text_regex => qr/php\?dl=/ );
my $fileNumber++;
foreach my $link (#linksOfInterest) {
$mech->get ( $link, ':contentfile' => "file".($fileNumber++).".txt" );
$mech->back();
}
You can download the file with LWP::UserAgent:
my $ua = LWP::UserAgent->new();
my $response = $ua->get($url, ':content_file' => 'file.txt');
Or if you need a filehandle:
open my $fh, '<', $response->content_ref or die $!;
Old question, but when I'm doing quick scripts, I often use "wget" or "curl" and pipe. This isn't cross-system portable, perhaps, but if I know my system has one or the other of these commands, it's generally good.
For example:
#! /usr/bin/env perl
use strict;
open my $fp, "curl http://www.example.com/ |";
while (<$fp>) {
print;
}