Retrieve online data and generate an xml file - perl

Whenever I run the following Perl script I got the errors below
Use of uninitialized value $date in concatenation (.) or string at D:\sagar\toc\Online_TOC.pl line 111, <> line 1.
Use of uninitialized value $first_page in concatenation (.) or string at D:\sagar\toc\Online_TOC.pl line 111, <> line 1.
Use of uninitialized value $last_page in concatenation (.) or string at D:\sagar\toc\Online_TOC.pl line 111, <> line 1.
The following code is run at the command prmpt by giving URL
http://ajpheart.physiology.org/content/309/11
It generates the meta_issue11.xml file but does not give proper output.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use HTML::Parser;
use WWW::Mechanize;
my ( $date, $first_page, $last_page, #toc );
sub get_date {
my ( $self, $tag, $attr ) = #_;
if ( 'span' eq $tag
and $attr->{class}
and 'highwire-cite-metadata-date' eq $attr->{class}
and not defined $date )
{
$self->handler( text => \&next_text_to_date, 'self, text' );
}
elsif ( 'span' eq $tag
and $attr->{class}
and 'highwire-cite-metadata-pages' eq $attr->{class} )
{
if ( not defined $first_page ) {
$self->handler( text => \&parse_first_page, 'self, text' );
}
else {
$self->handler( text => \&parse_last_page, 'self, text' );
}
}
elsif ( 'span' eq $tag
and $attr->{class}
and 'highwire-cite-metadata-doi' eq $attr->{class} )
{
$self->handler( text => \&retrieve_doi, 'self, text' );
}
elsif ( 'div' eq $tag
and $attr->{class}
and $attr->{class} =~ /\bissue-toc-section\b/ )
{
$self->handler( text => \&next_text_to_toc, 'self, text' );
}
}
sub next_text_to_date {
my ( $self, $text ) = #_;
$text =~ s/^\s+|\s+$//g;
$date = $text;
$self->handler( text => undef );
}
sub parse_first_page {
my ( $self, $text ) = #_;
if ( $text =~ /([A-Z0-9]+)(?:-[0-9A-Z]+)?/ ) {
$first_page = $1;
$self->handler( text => undef );
}
}
sub parse_last_page {
my ( $self, $text ) = #_;
if ( $text =~ /(?:[A-Z0-9]+-)?([0-9A-Z]+)/ ) {
$last_page = $1;
$self->handler( text => undef );
}
}
sub next_text_to_toc {
my ( $self, $text ) = #_;
push #toc, [$text];
$self->handler( text => undef );
}
sub retrieve_doi {
my ( $self, $text ) = #_;
if ( 'DOI:' ne $text ) {
$text =~ s/^\s+|\s+$//g;
push #{ $toc[-1] }, $text;
$self->handler( text => undef );
}
}
print STDERR 'Enter the URL: ';
chomp( my $url = <> );
my ( $volume, $issue ) = ( split m(/), $url )[ -2, -1 ];
my $p = 'HTML::Parser'->new(
api_version => 3,
start_h => [ \&get_date, 'self, tagname, attr' ],
);
my $mech = 'WWW::Mechanize'->new( agent => 'Mozilla' );
$mech->get( $url );
my $contents = $mech->content;
$p->parse( $contents );
$p->eof;
my $toc;
for my $section ( #toc ) {
$toc .= "<TocSection>\n";
$toc .= "<Heading>" . shift( #$section ) . "</Heading>\n";
$toc .= join q(), map "<DOI>$_</DOI>\n", #$section;
$toc .= "</TocSection>\n";
}
open( F6, ">meta_issue_$issue.xml" );
print F6 <<"__HTML__";
<!DOCTYPE MetaIssue SYSTEM "http://schema.highwire.org/public/toc/MetaIssue.pubids.dtd">
<MetaIssue volume="$volume" issue="$issue">
<Provider>Cadmus</Provider>
<IssueDate>$date</IssueDate>
<PageRange>$first_page-$last_page</PageRange>
<TOC>$toc</TOC>
</MetaIssue>
__HTML__

The primary problem is that you're checking the class string for equality, whereas the required class may be just one of several space-separated class names
But there are a number of other issues, such as using WWW::Mechanize just to fetch a web page when LWP::Simple will do fine. And checking three times for 'span' eq $tag
Here's a working version. I would prefer to see XML::Writer used to create the output XML, but I have kept to using simple print statements, as in your own code
Note that comments like #/ are just there to persuade the Stack Overflow syntax highlighter to colour the text correctly. You should remove them in the live code
#!/usr/bin/perl
use strict;
use warnings 'all';
use LWP::Simple 'get';
use HTML::Parser;
my ( $date, $first_page, $last_page, #toc );
print 'Enter the URL: ';
my $url = <>;
$url ||= 'http://ajpheart.physiology.org/content/309/11';
chomp $url;
my ( $volume, $issue ) = ( split m(/), $url )[ -2, -1 ]; #/
my $p = 'HTML::Parser'->new(
api_version => 3,
start_h => [ \&get_span_div, 'self, tagname, attr' ],
);
my $contents = get($url);
$p->parse( $contents );
$p->eof;
my $toc = '';
for my $section ( #toc ) {
$toc .= "\n";
$toc .= " <TocSection>\n";
$toc .= " <Heading>" . shift( #$section ) . "</Heading>\n";
$toc .= " <DOI>$_</DOI>\n" for #$section;
$toc .= " </TocSection>";
}
open my $out_fh, '>', "meta_issue_$issue.xml" or die $!;
print { $out_fh } <<"__HTML__";
<!DOCTYPE MetaIssue SYSTEM "http://schema.highwire.org/public/toc/MetaIssue.pubids.dtd">
<MetaIssue volume="$volume" issue="$issue">
<Provider>Cadmus</Provider>
<IssueDate>$date</IssueDate>
<PageRange>$first_page-$last_page</PageRange>
<TOC>$toc
</TOC>
</MetaIssue>
__HTML__
#/
sub get_span_div {
my ( $self, $tag, $attr ) = #_;
my $class = $attr->{class};
my %class;
%class = map { $_ => 1 } split ' ', $class if $class;
if ( $tag eq 'span' ) {
if ( $class{'highwire-cite-metadata-date'} ) {
$self->handler( text => \&next_text_to_date, 'self, text' ) unless $date;
}
elsif ( $class{'highwire-cite-metadata-pages'} ) {
if ( not defined $first_page ) {
$self->handler( text => \&parse_first_page, 'self, text' );
}
else {
$self->handler( text => \&parse_last_page, 'self, text' );
}
}
elsif ( $class{'highwire-cite-metadata-doi'} ) {
$self->handler( text => \&retrieve_doi, 'self, text' );
}
}
elsif ( $tag eq 'div' ) {
if ( $class{'issue-toc-section'} ) {
$self->handler( text => \&next_text_to_toc, 'self, text' );
}
}
}
sub next_text_to_date {
my ( $self, $text ) = #_;
($date = $text) =~ s/^\s+|\s+$//g; #/
$self->handler( text => undef );
}
sub parse_first_page {
my ( $self, $text ) = #_;
return unless $text =~ /(\w+)(-\w+)?/; #/
$first_page = $1;
$self->handler( text => undef );
}
sub parse_last_page {
my ( $self, $text ) = #_;
return unless $text =~ /\w+-(\w+)/; #/
$last_page = $1;
$self->handler( text => undef );
}
sub next_text_to_toc {
my ( $self, $text ) = #_;
push #toc, [ $text ];
$self->handler( text => undef );
}
sub retrieve_doi {
my ( $self, $text ) = #_;
return unless $text =~ /\d+/; #/
$text =~ s/^\s+|\s+$//g;
push #{ $toc[-1] }, $text;
$self->handler( text => undef );
}
output
<!DOCTYPE MetaIssue SYSTEM "http://schema.highwire.org/public/toc/MetaIssue.pubids.dtd">
<MetaIssue volume="309" issue="11">
<Provider>Cadmus</Provider>
<IssueDate>December 1, 2015</IssueDate>
<PageRange>H1793-H1996</PageRange>
<TOC>
<TocSection>
<Heading>CALL FOR PAPERS | Cardiovascular Responses to Environmental Stress</Heading>
<DOI>10.1152/ajpheart.00199.2015</DOI>
</TocSection>
<TocSection>
<Heading>CALL FOR PAPERS | Autophagy in the Cardiovascular System</Heading>
<DOI>10.1152/ajpheart.00709.2014</DOI>
</TocSection>
<TocSection>
<Heading>CALL FOR PAPERS | Mechanisms of Diastolic Dysfunction in Cardiovascular Disease</Heading>
<DOI>10.1152/ajpheart.00608.2015</DOI>
</TocSection>
<TocSection>
<Heading>CALL FOR PAPERS | Small Vessels–Big Problems: Novel Insights into Microvascular Mechanisms of Diseases</Heading>
<DOI>10.1152/ajpheart.00463.2015</DOI>
<DOI>10.1152/ajpheart.00691.2015</DOI>
<DOI>10.1152/ajpheart.00568.2015</DOI>
<DOI>10.1152/ajpheart.00653.2015</DOI>
</TocSection>
<TocSection>
<Heading>CALL FOR PAPERS | Exercise Training in Cardiovascular Disease: Mechanisms and Outcomes</Heading>
<DOI>10.1152/ajpheart.00341.2015</DOI>
</TocSection>
<TocSection>
<Heading>CALL FOR PAPERS | Cardiac Regeneration and Repair: Mechanisms and Therapy</Heading>
<DOI>10.1152/ajpheart.00594.2015</DOI>
</TocSection>
<TocSection>
<Heading>Vascular Biology and Microcirculation</Heading>
<DOI>10.1152/ajpheart.00289.2015</DOI>
<DOI>10.1152/ajpheart.00308.2015</DOI>
<DOI>10.1152/ajpheart.00179.2015</DOI>
</TocSection>
<TocSection>
<Heading>Muscle Mechanics and Ventricular Function</Heading>
<DOI>10.1152/ajpheart.00284.2015</DOI>
<DOI>10.1152/ajpheart.00327.2015</DOI>
</TocSection>
<TocSection>
<Heading>Signaling and Stress Response</Heading>
<DOI>10.1152/ajpheart.00050.2015</DOI>
</TocSection>
<TocSection>
<Heading>Cardiac Excitation and Contraction</Heading>
<DOI>10.1152/ajpheart.00055.2015</DOI>
</TocSection>
<TocSection>
<Heading>Integrative Cardiovascular Physiology and Pathophysiology</Heading>
<DOI>10.1152/ajpheart.00316.2015</DOI>
<DOI>10.1152/ajpheart.00721.2014</DOI>
</TocSection>
<TocSection>
<Heading>Corrigendum</Heading>
<DOI>10.1152/ajpheart.H-zh4-1780-corr.2015</DOI>
</TocSection>
</TOC>
</MetaIssue>

Related

Unable to retrieve multiple column values from file in Perl

I have a file with following contents:
TIME
DATE TIME DAY
191227 055526 FRI
RC DEV SERVER
RC1 SERVER1
RC2 SERVER2
RC3 SERVER3
END
I am fetching argument values from this file, say if I pass DATE as an argument to the script I am getting corresponding value of the DATE. i.e., 191227
When I pass multiple arguments say DATE, DAY I should get values:
DATE=191227
DAY=FRI
But what I am getting here is:
DATE=191227
DAY=NULL
And if I pass RC as an argument I should get:
RC=RC1,RC2,RC3
The script looks below:
#!/usr/bin/perl
use strict;
use Data::Dumper;
print Dumper(\#ARGV);
foreach my $name(#ARGV){
print "NAME:$name\n";
my ($result, $start, $stop, $width) = "";
while(my $head = <STDIN>)
{
if( $head =~ (m/\b$name\b/g))
{
$start = (pos $head) - length($name);
$stop = (pos $head);
my $line = <STDIN>;
pos $head = $stop+1;
$head =~ (m/\b/g);
$width = (pos $head) - $start;
$result = substr($line,$start,$width);
}
}
$result =~ s/^\s*(.*?)\s*$/$1/;
print "$name=";
$result = "NULL" if ( $result eq "" );
print "$result\n";
}
Can someone please help me to get values if I pass multiple arguments also if suppose argument value have data in multiple lines it should be printed comma separated values (ex: for RC, RC=RC1,RC2,RC3).
Here is an example, assuming the input file is named file.txt and the values are starting at the same horizontal position as the keys:
package Main;
use feature qw(say);
use strict;
use warnings;
use Data::Dumper qw(Dumper);
my $self = Main->new(fn => 'file.txt', params => [#ARGV]);
$self->read_file();
$self->print_values();
sub read_file {
my ( $self ) = #_;
my $fn = $self->{fn};
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
local $/ = ""; #Paragraph mode
my #blocks = <$fh>;
close $fh;
$self->{values} = {};
for my $block (#blocks) {
$self->parse_block( $block );
}
}
sub parse_block {
my ( $self, $block ) = #_;
my #lines = split /\n/, $block;
my $header = shift #lines;
my ($keys, $startpos) = $self->get_block_keys( $header );
for my $line ( #lines ) {
for my $key (#$keys) {
my $startpos = $startpos->{$key};
my $str = substr $line, $startpos;
my ( $value ) = $str =~ /^(\S+)/;
if ( defined $value ) {
push #{$self->{values}{$key}}, $value;
}
}
}
}
sub get_block_keys {
my ( $self, $header ) = #_;
my $values = $self->{values};
my #keys;
my %spos;
while ($header =~ /(\S+)/g) {
my $key = $1;
my $startpos = $-[1];
$spos{$key} = $startpos;
push #keys, $key;
}
for my $key (#keys) {
if ( !(exists $values->{$key}) ) {
$values->{$key} = [];
}
}
return (\#keys, \%spos);
}
sub new {
my ( $class, %args ) = #_;
return bless \%args, $class;
}
sub print_values {
my ( $self ) = #_;
my $values = $self->{values};
for my $key (#{$self->{params}}) {
my $value = "<NO VALUE FOUND>";
if ( exists $values->{$key} ) {
$value = join ",", #{$values->{$key}};
}
say "$key=$value";
}
}
Edit
If you want to read the file from STDIN instead, change the following part of the code:
# [...]
my $self = Main->new(params => [#ARGV]);
$self->read_file();
$self->print_values();
sub read_file {
my ( $self ) = #_;
local $/ = ""; #Paragraph mode
my #blocks = <STDIN>;
$self->{values} = {};
for my $block (#blocks) {
$self->parse_block( $block );
}
}
# [...]

Caesar Cipher -- "Isn't numeric in addition (+)"

I'm creating a Caesar cipher using Perl, but I cant seem to find the error in the code.
I keep getting the error message:
Argument "hello" isn't numeric in addition (+) at ./Lab03.pl line 66, <> line 1.
which is the line $translated += $symbol.
use warnings;
$x = 26;
sub getMode {
$e = "encrypt decrypt";
while ( 'True' ) {
print "Do you wish to encrypt or decrypt a message? \n";
$mode = <STDIN>;
chomp( $mode );
if ( $mode = split( //, $e ) ) {
return $mode;
}
else {
print "Enter either 'encrypt' or 'decrypt'.\n";
}
}
}
sub getMessage {
print "Enter your message:";
$input = <STDIN>;
chomp( $input );
return $input;
}
sub getKey {
$key = 0;
while ( 'True' ) {
print "Enter the key number (1-26): ";
$key = int( <> );
chomp( $key );
if ( $key >= 1 and $key <= $x ) {
return $key;
}
}
}
sub getTranslatedMessage {
( $mode, $message, $key ) = #_;
if ( $mode =~ /^d/ ) {
$key = -$key;
$translated = '';
}
foreach $symbol ( $message ) {
if ( $symbol =~ /[A-Za-z]/ ) {
$num = ord( $symbol );
$num += $key;
}
if ( $symbol =~ /^[A-Z]/ ) {
if ( $num > ord( 'Z' ) ) {
$num -= 26;
}
elsif ( $num < ord( 'A' ) ) {
$num += 26;
}
elsif ( $symbol = /^[a-z]/ ) {
if ( $num > ord( 'z' ) ) {
$num -= 26;
}
elsif ( $num < ord( 'a' ) ) {
$num += 26;
}
$translated += chr( $num );
}
}
else {
$translated += $symbol;
}
}
return $translated;
}
$mode = getMode();
$message = getMessage();
$key = getKey();
print "Your translated text is: '\n' ";
print( getTranslatedMessage( $mode, $message, $key ) );
In Perl, + is numeric addition only. String concatenation is . / .=.
Also:
if ($mode = split(//,$e)){
is incorrect. I believe you want something like:
my %valid_mode = ( 'encrypt' => 1, 'decrypt' => 1);
...
if ( $valid_mode{$mode} ) {
return $mode
The code you have is setting $mode into the number of characters in $e (in an inefficient way).
Here:
foreach $symbol ($message){
in Perl, strings are first class entities; they aren't automatically interpreted as arrays of characters. So to loop over the characters, you need to so something else. The simplest way is:
foreach $symbol ( split //, $message ) {
Here:
elsif ($symbol= /^[a-z]/){
= should be =~.
There is also a problem with which code is in which blocks that prevents upper case characters from being added to the output. It looks to me like the closing brace for your fir st if ($symbol =~ should be just before the later else, and other braces possibly fixed up to match.
Putting all your }'s on a line of their own, indented the same as the line with the corresponding { is a much better idea. It will help you see mismatched braces much more easily.
Here is corrected code, with use strict added and all variables declared:
use warnings;
use strict;
my $x = 26;
sub getMode{
my %valid_mode = ( 'encrypt' => 1, 'decrypt' => 1 );
while ('True'){
print"Do you wish to encrypt or decrypt a message? \n";
my $mode = <STDIN>;
chomp ( $mode);
if ($valid_mode{$mode}) {
return $mode;
}
else {
print "Enter either 'encrypt' or 'decrypt'.\n";
}
}
}
sub getMessage{
print"Enter your message:";
my $input = <STDIN>;
chomp ($input);
return $input;
}
sub getKey{
my $key = 0;
while ('True'){
print"Enter the key number (1-26): ";
$key = int(<>);
chomp ($key);
if ($key >= 1 and $key <= $x){
return $key;
}
}
}
sub getTranslatedMessage{
my ($mode, $message, $key) = #_;
if ($mode =~ /^d/){
$key = -$key;
}
my $translated = '';
foreach my $symbol (split //, $message){
if ($symbol =~ /[A-Za-z]/){
my $num = ord($symbol);
$num += $key;
if ($symbol =~ /^[A-Z]/){
if ($num > ord('Z')){
$num -= 26;
}
elsif ($num < ord('A')){
$num += 26;
}
}
elsif ($symbol=~ /^[a-z]/){
if ($num > ord('z')){
$num -= 26;
}
elsif ($num < ord('a')){
$num += 26;
}
}
$translated .= chr($num);
}
else{
$translated .= $symbol;
}
}
return $translated;
}
my $mode = getMode();
my $message = getMessage();
my $key = getKey();
print"Your translated text is:\n";
print(getTranslatedMessage($mode, $message, $key));
print "\n";
Over all, I suggest you write smaller chunks of code and test them to make sure they worked before assembling them all together.

How to concatenate values for duplicate hash keys in Perl?

I know it is not possible to have duplicate keys in a hash, but this is what my data looks like:
Key Value
SETUP_FACE_PROT great
SETUP_FACE_PROT great2
SETUP_FACE_PROT great3
SETUP_FACE_PROT great3
SETUP_ARM_PROT arm
SETUP_FOOT_PROT foot
SETUP_FOOT_PROT foot2
SETUP_HEAD_PROT goggle
I would like to concatenate values for repeated keys, separated by a * character. For example, this is what I want the output to look like:
SETUP_FACE_PROT'=great*great2*great3',
SETUP_ARM_PROT='arm',
SETUP_FOOT_PROT='foot*foot2',
SETUP_HEAD_PROT='google'
This is how I've tried to solve the problem so far:
foreach my $key ( sort keys %stuff )
{
print "$key=\'", join( "*", #{ $stuff{$key} } ), "\'\n";
}
But instead of printing the result, how can I store it in a variable so that I can pass it to another subroutine? I'm trying to create a new string that looks like this:
$newstring="
SETUP_FACE_PROT='great*great2*great3',
SETUP_ARM_PROT='arm',
SETUP_FOOT_PROT='foot*foot2',
SETUP_HEAD_PROT='google' "
You can't duplicate keys, you can create a hash of arrays.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my %stuff;
while (<DATA>) {
my ( $key, $value ) = split;
push( #{ $stuff{$key} }, $value );
}
print Dumper \%stuff;
foreach my $key ( sort keys %stuff ) {
print "$key=\'", join( "*", #{ $stuff{$key} } ), "\'\n";
}
__DATA__
SETUP_FACE_PROT great
SETUP_FACE_PROT great2
SETUP_FACE_PROT great3
SETUP_FACE_PROT great3
SETUP_ARM_PROT arm
SETUP_FOOT_PROT foot
SETUP_FOOT_PROT foot2
SETUP_HEAD_PROT goggle
Edit:
Turning it into a string as requested:
my $results;
foreach my $key ( sort keys %stuff ) {
$results .= "$key=\'". join( "*", #{ $stuff{$key} } ). "\'\n";
}
print $results;
Or perhaps using print still with a filehandle:
my $results;
open ( my $output, '>', \$results );
foreach my $key ( sort keys %stuff ) {
print {$output} "$key=\'", join( "*", #{ $stuff{$key} } ), "\'\n";
}
close ( $output );
print $results;
At last i got an answer doing this.
use Data::Dumper;
my %stuff;
use Text::CSV;
my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
my $filenamex = 'duplicate2.csv';
$checkstring ='';
open(my $datab, '<', $filenamex) or die "Could not open '$filename' $!\n";
$i=1;
my %datan;
while (my $linea = <$datab>)
{
chomp $linea;
#fieldsx = split ",",$linea;
$key = $fieldsx[0];
$value = $fieldsx[1];
# print $key;
push( #{ $stuff{$key} }, $value );
}
foreach my $key ( sort keys %stuff )
{
$checkstring = $checkstring.','.$key.'='. join( "*", #{ $stuff{$key} } );
}
print $checkstring;

Perl subroutine returning unexpected value

I am extending CGI and trying to add a simple router to it, just for fun.
Here are my Test::More tests
use strict;
use warnings;
use Data::Dumper;
use Test::More tests => 4;
use CGI::Router;
my $router = CGI::Router->new;
my $resp;
## 1. test ##
$ENV{'REQUEST_URI'} = '/';
$ENV{'REQUEST_METHOD'} = 'GET';
$resp = $router->connect('GET /', sub {
# print Dumper #_;
return 'Hello 1';
});
# print Dumper $resp;
ok( $resp eq 'Hello 1' );
## 2. test ##
$ENV{'REQUEST_URI'} = '/hello';
$ENV{'REQUEST_METHOD'} = 'GET';
$resp = $router->connect('GET /hello', sub {
# print Dumper #_;
return 'Hello 2';
});
# print Dumper $resp;
ok( $resp eq 'Hello 2' );
## 3. test ##
$ENV{'REQUEST_URI'} = '/hello/kitty';
$ENV{'REQUEST_METHOD'} = 'GET';
$resp = $router->connect('GET /hello/:who', sub {
# print Dumper #_;
return 'Hello 3';
});
# print Dumper $resp;
ok( $resp eq 'Hello 3' );
## 4. test ##
$ENV{'REQUEST_URI'} = '/hello/kitty/kat';
$ENV{'REQUEST_METHOD'} = 'GET';
$resp = $router->connect('GET /hello/:who/:what', sub {
# print Dumper #_;
return 'Hello kitty kat';
});
# print Dumper $resp;
ok( $resp eq 'Hello kitty kat' );
$router->run;
Here is my module
package CGI::Router;
use strict;
use warnings;
use parent 'CGI';
use Carp;
use Data::Dumper;
sub connect {
my ( $self, $req, $subr ) = #_;
$self->{routes} //= {};
$self->{env} //= \%ENV;
if ( ! exists $self->{routes}->{$req} ) {
$self->{routes}->{$req} = {
handler => $subr,
pattern => $self->build_pattern( $req ),
method => $req =~ /^(GET|PUT|POST|DELETE)/
};
} else {
Carp::croak( "Similar request already exists $req!" );
}
# Get current request destination
# TODO: Add that stupid IIS HTTP header
$self->{destination} = $self->{env}->{REQUEST_URI};
$self->{method} = $self->{env}->{REQUEST_METHOD};
return $self->mapper();
}
sub mapper {
my $self = shift;
my $router;
my #params;
foreach my $key ( keys %{ $self->{routes} } ) {
my $route = $self->{routes}->{$key};
if ( $self->{method} eq $route->{method} &&
$self->{destination} =~ $route->{pattern} ) {
#params = $self->{destination} =~ $route->{pattern};
$router = $route;
}
}
return $router->{handler}->( #params );
}
sub build_pattern {
my ( $self, $pattern ) = #_;
$pattern =~ s/(GET|POST|PUT|DELETE)\s?//i;
$pattern = do {
# Replace something like /word/:token with /word/(^:([a-z]+))
$pattern =~ s!
(\:([a-z]+))
!
if ( $2 ) {
"([^/]+)"
}
!gex;
"^$pattern\$";
};
return $pattern;
}
sub run {
}
1;
When the test cases run and I Dump e.g $resp in ## 4. test ## the returned value is not some version of "Hello kitty" but 'GET'.
Here is the output of the test
1..4
ok 1
ok 2
ok 3
ok 4
Why do all the subroutines return 'GET', I don't see where I generate this output.
I know a ton of similar frameworks exists, I am just doing this for fun :)
I modified your code to work/not error. Take it or leave it ;)
CGI/Router.pm:
Things changed:
build_pattern returns a compiled regex via qr/$pattern/
connect param handling is less confusing. You were taking $self, #args off #_, but then taking $req, $subr from #args and doing nothing else with it. So I moved them up
connect returns the value of run
$foo = $bar if !defined $foo; is better written as $foo //= $bar;. Similar to $foo ||= $bar but checks for definedness rather than truth.
Code:
package CGI::Router;
use strict;
use warnings;
use parent 'CGI';
use Carp;
use Data::Dumper;
sub connect {
my ( $self, $req, $subr ) = #_;
$self->{routes} //= {};
$self->{env} //= \%ENV;
if ( !exists $self->{routes}->{$req} ) {
$self->{routes}->{$req} = {
handler => $subr,
pattern => $self->build_pattern($req),
method => $req =~ /^(GET|PUT|POST|DELETE)/
};
}
else {
Carp::croak("Similar request already exists $req!");
}
# Get current request destination
# TODO: Add that stupid IIS HTTP header
$self->{destination} = $self->{env}->{REQUEST_URI};
$self->{method} = $self->{env}->{REQUEST_METHOD};
return $self->run();
}
sub build_pattern {
my ( $self, $pattern ) = #_;
$pattern =~ s/(GET|POST|PUT|DELETE)\s?//i;
$pattern = do {
# Replace something like /word/:token with /word/(^:([a-z]+))
$pattern =~ s!
(\:([a-z]+))
!
if ( $2 ) {
"([^/]+)"
}
!gex;
"^$pattern\$";
};
return qr/$pattern/;
}
sub run {
my $self = shift;
my $router;
my #params;
foreach my $key ( keys %{ $self->{routes} } ) {
my $route = $self->{routes}->{$key};
if ( $self->{method} eq $route->{method}
&& $self->{destination} =~ $route->{pattern} )
{
#params =
$self->{destination} =~ $route->{pattern}; # Not fully working yet
$router = $route;
}
}
return $router->{handler}->(#params);
}
1;
test-router.pl:
Things changed:
The BEGIN block was doing setup that you ought not do for a test script. I.e. randomising the flow.. so I ditched that off
Added the environment variables for each test case
Code:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Test::More tests => 4;
use CGI::Router;
my $router = CGI::Router->new;
my $resp;
## 1. test ##
$ENV{'REQUEST_URI'} = '/';
$ENV{'REQUEST_METHOD'} = 'GET';
$resp = $router->connect('GET /', sub {
# print Dumper #_;
return 'Hello 1';
});
print Dumper $resp;
ok( $resp eq 'Hello 1' );
## 2. test ##
$ENV{'REQUEST_URI'} = '/hello';
$ENV{'REQUEST_METHOD'} = 'GET';
$resp = $router->connect('GET /hello', sub {
# print Dumper #_;
return 'Hello 2';
});
print Dumper $resp;
ok( $resp eq 'Hello 2' );
## 3. test ##
$ENV{'REQUEST_URI'} = '/hello/kitty';
$ENV{'REQUEST_METHOD'} = 'GET';
$resp = $router->connect('GET /hello/:who', sub {
# print Dumper #_;
return 'Hello 3';
});
print Dumper $resp;
ok( $resp eq 'Hello 3' );
## 4. test ##
$ENV{'REQUEST_URI'} = '/hello/kitty/kat';
$ENV{'REQUEST_METHOD'} = 'GET';
$resp = $router->connect('GET /hello/:who/:what', sub {
# print Dumper #_;
return 'Hello kitty kat';
});
print Dumper $resp;
ok( $resp eq 'Hello kitty kat' );

Why can't I fetch www.google.com with Perl's LWP::Simple?

I cant seem to get this peice of code to work:
$self->{_current_page} = $href;
my $response = $ua->get($href);
my $responseCode = $response->code;
if( $responseCode ne "404" ) {
my $content = LWP::Simple->get($href);
die "get failed: " . $href if (!defined $content);
}
Will return error: get failed: http://www.google.com
The full code is as follows:
#!/usr/bin/perl
use strict;
use URI;
use URI::http;
use File::Basename;
use DBI;
use LWP::Simple;
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;
$ua->max_redirect(0);
package Crawler;
sub new {
my $class = shift;
my $self = {
_url => shift,
_max_link => 0,
_local => 1
};
bless $self, $class;
return $self;
}
sub trim{
my( $self, $string ) = #_;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
sub process_image {
my ($self, $process_image) = #_;
$self->{_process_image} = $process_image;
}
sub local {
my ($self, $local) = #_;
$self->{_local} = $local;
}
sub max_link {
my ($self, $max_link) = #_;
$self->{_max_link} = $max_link;
}
sub x_more {
my ($self, $x_more) = #_;
$self->{_x_more} = $x_more;
}
sub resolve_href {
my ($base, $href) = #_;
my $uri = URI->new($href);
return $uri->rel($base);
}
sub write {
my ( $self, $ref, $data ) = #_;
open FILE, '>c:/perlscripts/' . $ref . '_' . $self->{_process_image} . '.txt';
foreach( $data ) {
print FILE $self->trim($_) . "\n";
}
close( FILE );
}
sub scrape {
my #m_error_array;
my #m_href_array;
my #href_array;
my ( $self, $DBhost, $DBuser, $DBpass, $DBname ) = #_;
my ($dbh, $query, $result, $array);
my $DNS = "dbi:mysql:$DBname:$DBhost:3306";
$dbh = DBI->connect($DNS, $DBuser, $DBpass ) or die $DBI::errstr;
if( defined( $self->{_process_image} ) && ( -e 'c:/perlscripts/href_w_' . $self->{_process_image} . ".txt" ) ) {
open ERROR_W, "<c:/perlscripts/error_w_" . $self->{_process_image} . ".txt";
open M_HREF_W, "<c:/perlscripts/m_href_w_" . $self->{_process_image} . ".txt";
open HREF_W, "<c:/perlscripts/href_w_" . $self->{_process_image} . ".txt";
#m_error_array = <ERROR_W>;
#m_href_array = <M_HREF_W>;
#href_array = <HREF_W>;
close ( ERROR_W );
close ( M_HREF_W );
close ( HREF_W );
}else{
#href_array = ( $self->{_url} );
}
my $z = 0;
while( #href_array ){
if( defined( $self->{_x_more} ) && $z == $self->{_x_more} ) {
last;
}
if( defined( $self->{_process_image} ) ) {
$self->write( 'm_href_w', #m_href_array );
$self->write( 'href_w', #href_array );
$self->write( 'error_w', #m_error_array );
}
$self->{_link_count} = scalar #m_href_array;
my $href = shift( #href_array );
my $info = URI::http->new($href);
my $host = $info->host;
$host =~ s/^www\.//;
$result = $dbh->prepare("INSERT INTO `". $host ."` (URL) VALUES ('$href')");
if( ! $result->execute() ){
$result = $dbh->prepare("CREATE TABLE `" . $host . "` ( `ID` INT( 255 ) NOT NULL AUTO_INCREMENT , `URL` VARCHAR( 255 ) NOT NULL , PRIMARY KEY ( `ID` )) ENGINE = MYISAM ;");
$result->execute();
}
$self->{_current_page} = $href;
my $response = $ua->get($href);
my $responseCode = $response->code;
if( $responseCode ne "404" ) {
my $content = LWP::Simple->get($href);
die "get failed: " . $href if (!defined $content);
}
#print $responseCode;
}
}
1;
#$query = "SELECT * FROM `actwebdesigns.co.uk` ORDER BY ID DESC";
#$result = $dbh->prepare($query);
#$result->execute();
#while( $array = $result->fetchrow_hashref() ) {
# print $array->{'URL'} . "\n";
#}
EDIT:
Still not working with redirect fixed.
my $redirect_limit = 10;
my $y = 0;
while( 1 && $y le $redirect_limit ) {
my $response = $ua->get($href);
my $responseCode = $response->code;
if( $responseCode == 200 || $responseCode == 301 || $responseCode == 302 ) {
if( $responseCode == 301 || $responseCode == 302 ) {
$href = $response->header('Location');
}else{
last;
}
}else{
push( #m_error_array, $href );
last;
}
$y++;
}
if( $y ne $redirect_limit ) {
if( ! defined( $self->{_url_list} ) ) {
my #url_list = ( $href );
}else{
my #url_list = $self->{_url_list};
push( #url_list, $href );
$self->{_url_list} = #url_list;
}
my $content = LWP::Simple->get($href);
die "get failed: " . $href if (!defined $content);
#$result = $dbh->prepare("INSERT INTO `". $host ."` (URL) VALUES ('$href')");
#if( ! $result->execute() ){
# $result = $dbh->prepare("CREATE TABLE `" . $host . "` ( `ID` INT( 255 ) NOT NULL AUTO_INCREMENT , `URL` VARCHAR( 255 ) NOT NULL , PRIMARY KEY ( `ID` )) ENGINE = MYISAM ;");
# $result->execute();
#}
print "good";
}else{
push( #m_error_array, $href );
}
You should examine the response code to see what's happening (you're already checking for 404s). I get a 302 - a redirect.
For example:
die "get failed ($responseCode): " . $href if (!defined $content);
Resulting message:
get failed (302): http://www.google.com at goog.pl line 20.
A couple of thoughts.
1/ You seems to be using the string comparison operators (le, ne) to compare numbers. You should use the numeric comparison operators (<=, !=) instead.
2/ The value you get back from the LWP::UserAgent::get call is an HTTP::Response object. Judicious use of that class's "is_foo" method might make your code a bit cleaner.
I don't know if either of these will solve your problem. But they'll improve the quality of your code.
Here's your problem:
my $content = LWP::Simple->get($href);
That passes the string "LWP::Simple" as the first argument to 'get'. You want:
my $content = LWP::Simple::get($href);
Check your SELinux settings.
SELINUX enabled systems will not allow an outgoing connection from a web agent (httpd).
This page can tell you more about SELinux and HTTPD settings:
http://wiki.centos.org/TipsAndTricks/SelinuxBooleans
Enable outbound web connections from Apache in a Perl script:
# setsebool -P httpd_can_network_connect on