How can I extract links from an HTML file with Perl? - perl

I have some input with a link and I want to open that link. For instance, I have an HTML file and want to find all links in the file and open their contents in an Excel spreadsheet.

It sounds like you want the linktractor script from my HTML::SimpleLinkExtor module.
You might also be interested in my webreaper script. I wrote that a long, long time ago to do something close to this same task. I don't really recommend it because other tools are much better now, but you can at least look at the code.
CPAN and Google are your friends. :)
Mojo::UserAgent is quite nice for this, too:
use Mojo::UserAgent
print Mojo::UserAgent
->new
->get( $ARGV[0] )
->res
->dom->find( "a" )
->map( attr => "href" )
->join( "\n" );

That sounds like a job for WWW::Mechanize. It provides a fairly high level interface to fetching and studying web pages.
Once you've read the docs, I think you'll have a good idea how to go about it.

There is also Web::Query:
#!/usr/bin/env perl
use 5.10.0;
use strict;
use warnings;
use Web::Query;
say for wq( shift )->find('a')->attr('href');
Or, from the cli:
$ perl -MWeb::Query -E'say for wq(shift)->find("a")->attr("href")' \
http://techblog.babyl.ca

I've used URI::Find for this in the past (for when the file is not HTML).

Related

Can't call method "text" on an undefined value at sample.pl line 11

use strict; # safety net
use warnings; # safety net
use feature 'say'; # a better "print"
use Mojo;
my $dom = Mojo::DOM->new;
my $ua = Mojo::UserAgent->new;
$dom= $ua->get('http://search.cpan.org/faq.html')->res->dom;
my $desc=$dom->at('#u')->text;
when I run this code the above error has occur . This is my input data form following web page pls refer this
I want output like this only answers.
CPAN Search is a search engine for the distributions, modules, docs, and ID's on CPAN. It was conceived and built by Graham Barr as a way to make things easier to navigate. Originally named TUCS [ The Ultimate CPAN Search ] it was later named CPAN Search or Search DOT CPAN.
If you are having technical difficulties with the site itself, send mail to cpansearch#perl.org and try to be as detailed as possible in your note describing the problem.
pls anyone can help me
Something like this:
perl -Mojo -le 'print r g("http://search.cpan.org/faq.html")->dom("#cpansearch > div.pod > p")->map("text")->to_array;'

How to listen to URL routes in perl

I'm starting my first perl project, and wanted to know how to listen to different end points, I.e. example.com/home (how do you load an HTML page when someone visits this home route?
Just a note that I'm not interested in using a framework for this particular project. Thanks
Well, I guess you could have a CGI program that interprets the path and takes the appropriate action. You could then combine that with a mod_rewrite rule that diverts all requests into that program.
But it's all looking a bit kludgy and a framework would be a much better solution.
The simplest way to talk to a server is CGI.
This is not Perl specific, but Perl was commonly used for it. It is very slow, but simple.
Here is small demo. You put this in the cgi-bin directory of your server, and go go http://www.example.com/cgi-bin/cgidemo.cgi and back pops the content of the Perl #INC array.
To hook it up to /home you could alias it in your .htaccess file.
Of course, this is all ancient and slow stuff and has been far surpassed and sped up by fastcgi, mod_perl, and lots of other stuff. I like the Mojolicous framework myself.
#!/usr/bin/perl
# cgidemo.cgi - minimal CGI program
use strict;
use warnings;
# Headers
print "Content-type: text/plain\n";
# Blank line after header
print "\n";
# Body
print "Perl Include Path:\n";
print join("\n", #INC), "\n";

What Perl module can I use to test CGI output for common errors?

Is there a Perl module which can test the CGI output of another program? E.g. I have a program
x.cgi
(this program is not in Perl) and I want to run it from program
test_x_cgi.pl
So, e.g. test_x_cgi.pl is something like
#!perl
use IPC::Run3
run3 (("x.cgi"), ...)
So in test_x_cgi.pl I want to automatically check that the output of x.cgi doesn't do stupid things like, e.g. print messages before the HTTP header is fully outputted. In other words, I want to have a kind of "browser" in Perl which processes the output. Before I try to create such a thing myself, is there any module on CPAN which does this?
Please note that x.cgi here is not a Perl script; I am trying to write a test framework for it in Perl. So, specifically, I want to test a string of output for ill-formedness.
Edit: Thanks
I have already written a module which does what I want, so feel free to answer this question for the benefit of other people, but any further answers are academic as far as I'm concerned.
There's CGI::Test, which looks like what you're looking for. It specifically mentions the ability to test non-Perl CGI programs. It hasn't been updated for a while, but neither has the CGI spec.
There is Test::HTTP. I have not used it, but seems to have an interface that fits your requirements.
$test->header_is($header_name, $value [, $description]);
Compares the response header
$header_name with the value $value
using Test::Builder-is>.
$test->header_like($header_name, $regex, [, $description]);
Compares the response header
$header_name with the regex $regex
using Test::Builder-like>.
Look at the examples from chapter 16 from the perl cookbook
16.9. Controlling the Input, Output, and Error of Another Program
It uses IPC::Open3.
Fom perl cookbook, might be modified by me, see below.
Example 16.2
cmd3sel - control all three of kids in, out, and error.
use IPC::Open3;
use IO::Select;
$cmd = "grep vt33 /none/such - /etc/termcap";
my $pid = open3(*CMD_IN, *CMD_OUT, *CMD_ERR, $cmd);
$SIG{CHLD} = sub {
print "REAPER: status $? on $pid\n" if waitpid($pid, 0) > 0
};
#print CMD_IN "test test 1 2 3 \n";
close(CMD_IN);
my $selector = IO::Select->new();
$selector->add(*CMD_ERR, *CMD_OUT);
while (my #ready = $selector->can_read) {
foreach my $fh (#ready) {
if (fileno($fh) == fileno(CMD_ERR)) {print "STDERR: ", scalar <CMD_ERR>}
else {print "STDOUT: ", scalar <CMD_OUT>}
$selector->remove($fh) if eof($fh);
}
}
close(CMD_OUT);
close(CMD_ERR);
If you want to check that the output of x.cgi is properly formatted HTML/XHTML/XML/etc, why not run it through the W3 Validator?
You can download the source and find some way to call it from your Perl test script. Or, you might able to leverage this Perl interface to calling the W3 Validator on the web.
If you want to write a testing framework, I'd suggest taking a look at Test::More from CPAN as a good starting point. It's powerful but fairly easy to use and is definitely going to be better than cobbling something together as a one-off.

What is a better way to stream audio with Perl CGI?

Stackoverflow:
For a cs assigment I am using the following code to stream audio. However, now I would like to add the ability to stream files successively, as in a playlist, how can I modify my code to accommodate this? I would like to have a text file of filenames that my script passes through sequentially streaming each. Is this possible? I've spent a good bit of time googling yet found few relevant links.
Thanks,
CB
#!/usr/bin/perl
use strict;
use CGI::Carp qw/fatalsToBrowser/;
open(OGGFILE, "../HW1/OGG/ACDC.ogg") or die "open error";
my $buffer;
print "Content-type: audio/ogg\n\n";
binmode STDOUT;
while( read(OGGFILE, $buffer, 16384)){
print $buffer;
}
close(OGGFILE);
Update:
I've since modified my code to create a playlist and it seems to be working well. However, for this to work, I am storing my music files in my html folder, available for all to see. Is it a simple matter of changing file permissions to prevent direct linking and visibility? Is it possible for me to modify this program so that it streams the files from a folder outside of /html?
Thanks
CB
#!/usr/bin/perl
use strict;
use CGI qw/:standard/;
use CGI::Pretty qw/:standard/;
use CGI::Carp qw/fatalsToBrowser/;
print header(-type=>'audio/x-mpegurl',-expires=>'now');
printf "#EXTM3U\n";
printf "#EXTINF:-1,Some ACDC song\n";
printf "http://www.mywebserver/MP3/ACDC.ogg\n";
printf "#EXTINF:-1,Some Pink Floyd Song\n";
printf "http://www.mywebserver.com/MP3/PinkFloyd.ogg\n";
For the players I've dealt with, I had to provide a specially formatted playlist that listed the sequence of audio files. The player then requested the audio files as it needed them. You'll have one program to serve that playlist, and another to serve individual audio files.
As for your current program, I'd get the Perl program completely out of the way. Just let the web server handle it, which will be much faster. Your program doesn't do anything the web server doesn't already do for you, so don't make it do the extra work. :)

Can I rate a song in iTunes (on a Mac) using Perl?

I've tried searching CPAN. I found Mac::iTunes, but not a way to assign a rating to a particular track.
If you're not excited by Mac::AppleScript, which just takes a big blob of AppleScript text and runs it, you might prefer Mac::AppleScript::Glue, which provides a more object-oriented interface. Here's the equivalent to Iamamac's sample code:
#!/usr/bin/env perl
use Modern::Perl;
use Mac::AppleScript::Glue;
use Data::Dumper;
my $itunes = Mac::AppleScript::Glue::Application->new('iTunes');
# might crash if iTunes isn't playing anything yet
my $track = $itunes->current_track;
# for expository purposes, let's see what we're dealing with
say Dumper \$itunes, \$track;
say $track->rating; # initially undef
$track->set(rating => 100);
say $track->rating; # should print 100
All that module does is build a big blob of AppleScript, run it, and then break it all apart into another AppleScript expression that it can use on your next command. You can see that in the _ref value of the track object when you run the above script. Because al it's doing is pasting and parsing AppleScript, this module won't be any faster than any other AppleScript-based approach, but it does allow you to intersperse other Perl commands within your script, and it keeps your code looking a little more like Perl, for what that's worth.
You can write AppleScript to fully control iTunes, and there is a Perl binding Mac::AppleScript.
EDIT Code Sample:
use Mac::AppleScript qw(RunAppleScript);
RunAppleScript(qq(tell application "iTunes" \n set rating of current track to $r \n end tell));
Have a look at itunes-perl, it seems to be able to rate tracks.