How to use perl to get information about a video? - perl

Verify whether the input url was obtained
get the url
see if the url was redirected
get meta tags as key-value pairs
get video tags as a list
url of the next suggested video
I tried Video::Info which provides the general information. However I don't know how to get things like whether it is redirected or not, the tags and the next video. Any help is appreciated.
thanks

Take a look at the CGI module. Perhaps this will get you started with the general process:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use CGI;
my $cgi = CGI->new('metadata=abc&nextVideo=efg&foo=bar&blah=etc');
my #URLQueryKeys = $cgi->param;
foreach my $URLQueryKey (#URLQueryKeys) {
print Dumper $URLQueryKey;
print Dumper $cgi->param($URLQueryKey);
}

Related

get the whole content from the web site using perl script

I am a new hire in my company and first time I am working on Perl.
I get a task in which I find IP-Reputation from this link: https://www.talosintelligence.com/reputation_center/lookup?search=27.34.246.62
But in perl when we use:
#!/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
open FILE1, ">./Reports/Reputation.txt" or die "Cannot open Reputation.txt!";
my $mech = WWW::Mechanize->new( autocheck => 1 );
my $url="https://www.talosintelligence.com/reputation_center/lookup?search=27.34.246.62";
$mech->get($url);
print $mech->status();
my $content = $mech->content();
open FILE1, ">./Reports/Reputation.txt" or die "Cannot open Reputation.txt!";
print FILE1 ($content);
close FILE1;
print "\nIP Reputation Report Generated \n";
I don't get the whole content. What can I do to get this?
Contents are loading from JavaScript. So you can't crawl the content using simple methods.
There is two option for this kind of situation.
1) Some API contains the original data and JavaScript loads/formating the data in front end. If you want to parse the JavaScript loading content try to use the
WWW::Mechanize::Firefox
2) Try to figure out from where it is loading, for your IP following link has the corresponding data, which is JSON formated, so parse the content using JSON module. it is so simple compare to using RegEx.
https://www.talosintelligence.com/sb_api/query_lookup?query=%2Fapi%2Fv2%2Frelated_ips%2Fip%2F&query_entry=27.34.246.62

Fetch a URL 100 times using Perl

The problem I met is that I need to get one URL (I cannot be specific that link exactly, this link is doing request and looks like http://link.com/?name=name&password=password& and etc)
And I need to fetch this URL 100 times in a row. I can not do this manually using browser - this takes much time.
Is there any option to run (just run, like you put link in browser and press enter) this link 100 times in a row using Perl scripting?
I have not met before with the Perl and therefore asking the help directly. As I google before some information and make a little script, but seems like I missing something in my knowledge:
#!/usr/bin/perl -w
use LWP::Simple;
my $uri = 'http://my link here';
my $content = get $uri;
Could you please advise to me how I can finish this script?
Use a (simple) for loop.
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my $uri = 'http://my link here';
get $uri for 1 .. 100;
Update: Just read in a comment that you don't care about the returned data, so I've edited my answer to remove the unnecessary $content variable.

Automatic Search Using WWW::Mechanize

I am trying to write a Perl script which will automatically key in search variables on this LexisNexis search page and retrieve the search results.
I am using the WWW::Mechanize module but I am not sure how to figure out the field name of the search bar itself. This is the script I have so far ->
#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
my $m = WWW::Mechanize->new();
my $url = "http://www.lexisnexis.com/hottopics/lnacademic/?verb=sr&csi=379740";
$m->get($url);
$m->form_name('f');
$m->field('q', 'Test');
my $response = $m->submit();
print $response->content();
However, I think the "Name" of the search box in this website is not "q". I am getting the following Error - "Can't call method "value" on an undefined value at site/lib/WWW/Mechanize.pm line 1442." Any help is much appreciated. Thank you !
If you disable the JavaScript in your browser then you will notice that the search form doesn't load which means it's being loaded by JavaScript, that's why you are unable to handle it with WWW::Mechanize. Have a look at WWW::Mechanize::Firefox, this might help you with your task. Check out the example scripts, cookbook and FAQs.
You can also do the same using Selenium, see Gabor's tutorial on Selenium.

Perl LWP::Simple won't GET some URLs

I am trying to write a basic webscraping program in Perl. For some reason it is not working correctly and I don't have the slightest clue as to why.
Just the first part of my code where I am getting the content (just saving all of the HTML code from the webpage to a variable) does not work with certain websites.
I am testing it by just printing it out, and it does not print anything out with this specific website. It works with some other sites, but not all.
Is there another way of doing this that will work?
#use strict;
use LWP::Simple qw/get/;
use LWP::Simple qw/getstore/;
## Grab a Web page, and throw the content in a Perl variable.
my $content = get("https://jobscout.lhh.com/Portal/Page/ResumeProfile.aspx?Mode=View&ResumeId=53650");
print $content;
You have a badly-written web site there. The request times out with a 500 Internal Server Error.
I can't suggest how to get around it, but the site almost certainly uses JavaScript as well which LWP doesn't support, so I doubt if an answer would be much use to you.
Update
It looks like the site has been written so that it goes crazy if there is no Accept-Language header in the request.
The full LWP::UserAgent module is necessary to set it up, like this
use strict;
use warnings;
use LWP;
my $ua = LWP::UserAgent->new(timeout => 10);
my $url = 'https://jobscout.lhh.com/Portal/Page/ResumeProfile.aspx?Mode=View&ResumeId=53650';
my $resp = $ua->get($url, accept_language => 'en-gb,en', );
print $resp->status_line, "\n\n";
print $resp->decoded_content;
This returns with a status of 200 OK and some HTML.
To interact with a website that uses Javascript, I would advise that you use the following module:WWW::Mechanize::Firefox
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $url = "https://jobscout.lhh.com/Portal/Page/ResumeProfile.aspx?Mode=View&ResumeId=53650"
my $mech = WWW::Mechanize::Firefox->new();
$mech->get($url);
print $mech->status();
my $content = $mech->content();

Perl script to automate a website for bioinformatics

I would like to automate this website with a Perl script
http://bioinfo.uni-plovdiv.bg/microinspector/
This is what I have so far and I am not sure how to get to the output page after this, I know it has something to do with POST, redirect_ok?, response(), but I am not sure. I read through the documentation but am confused about some things. Thanks.
use strict;
use warnings;
use WWW::Mechanize;
# create object for browser
my $browser = WWW::Mechanize->new();
my ($sequence, $results);
open (DRG, "<microRNA_target_cspg_drg_output.fa") || die "cannot open microRNA_target_cspg_drg_output.fa";
while (<DRG>) {
chomp;
$sequence=$_;
last; #for testing purposes
}
close (DRG);
$browser->get("http://bioinfo.uni-plovdiv.bg/microinspector/");
$browser->form_number(1);
$browser->field("target_sequence", $sequence);
$browser->field("Choose an organism : ", "Mus musculus");
$browser->click_button( number => 1);
You should start with WWW::Mechanize. It's page provides examples on submitting forms, and anything else you will need.
EDIT: as a reply to your update, if you want to get the content of the page, use the content method, like in this example:
my $content = $browser->content();
See this for more info.