I'm making progress but I've run into a new problem.
This is the new code:
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
use HTML::TreeBuilder;
my $url = 'http://oreilly.com/store/complete.html';
my $page = get( $url ) or die $!;
my $p = HTML::TreeBuilder->new_from_content( $page );
my($book);
my($edition);
my #links = $p->look_down(
_tag => 'a',
href => qr{^ /Qhttp://www.oreilly.com/catalog/\E \w+ $}x
);
my #rows = map { $_->parent->parent } #links;
my #books;
for my $row (#rows) {
my %book;
my #cells = $row->look_down( _tag => 'td' );
$book{title} =$cells[0]->as_trimmed-text;
$book{price} =$cells[2]->as_trimmed-text;
$book{price} =~ s/^\$//;
$book{url} = get_url( $cells[0] );
$book{ebook} = get_url( $cells[3] );
$book{safari} = get_url( $cells[4] );
$book{examples} = get_url( $cells[5] );
push #books, \%book;
}
sub get_url {
my $node = shift;
my #hrefs = $node->look_down( _tag => 'a');
return unless #hrefs;
my $url = $hrefs[0]->atr('href');
$url =~ s/\s+$//;
return $url;
}
$p = $p->delete; #we don't need this anymore.
{
my $count = 1;
my #perlbooks = sort { $a->{price} <=> $b->{price} }
grep { $_->{title} =~/perl/i } #books;
print $count++, "\t", $_->{price}, "\t", $_->{title} for #perlbooks;
}
{
my #perlbooks = grep { $_->{title} =~ /perl/i } #books;
my #javabooks = grep { $_->{title} =~ /java/i } #books;
my $diff = #javabooks - #perlbooks;
print "There are ".#perlbooks." Perl books and ".#javabooks. " Java books. $diff more Java than Perl.";
}
for my $book ( $books[34] ) {
my $url = $book->{url};
my $page = get( $url );
my $tree = HTML::TreeBuilder->new_from_content( $page );
my ($pubinfo) = $tree->look_down(
_tag => 'span',
class => 'secondary2'
);
my $html = $pubinfo->as_HTML; print $html;
my ($pages) = $html =~ /(\d+) pages/,
my ($edition) = $html =~ /(\d)(?:st|nd|rd|th) Edition/;
my ($date) = $html =~ /(\w+ (19|20)\d\d)/;
print "\n$pages $edition $date\n";
my ($img_node) = $tree->look_down(
_tag => 'img',
src => qr{^/catalog/covers/},
);
my $img_url = 'http://www.oreilly.com'.$img_node->attr('src');
my $cover = get( $img_url );
# now save $cover to disk
}
Now I'm getting these errors,
Bareword "text" not allowed while "strict subs" in use at ./SpiderTutorial_19_06.pl line 23.
Bareword "text" not allowed while "strict subs" in use at ./SpiderTutorial_19_06.pl line 24.
Execution of ./SpiderTutorial_19_06.pl aborted due to compilation errors.
Any help would be greatly appreciated.
I don't know the original program but most likely as_trimmed-text should be as_trimmed_text.
The problem is the method name as_trimmed-text. Hyphens aren't allowed in names in perl. You probably meant as_trimmed_text. Now it parsed as $cells[0]->as_trimmed() - text().
Related
I don't understand what undef is doing in this snippet:
$dbh->do (qq {
INSERT INTO todo SET t = NOW(), status = 'open', content = ?
}, undef, $content);
Can someone please explain? I think I understand the whole code, but not this where it came from.
use warnings;
use strict;
use lib q(/data/TEST/perl/lib);
use CGI qw(:standard);
use WebDB;
sub insert_item {
my $content = shift;
my $dbh;
$content =~ s/^\s+//;
$content =~ s/^\s+$//;
if ($content ne "") {
$dbh = WebDB::connect();
$dbh->do (qq {
INSERT INTO todo SET t = NOW(), status = 'open', content = ?
}, undef, $content);
$dbh->disconnect();
}
}
sub display_entry_form {
print start_form(-action=> url()),
"To-do item:", br (),
textarea ( -name => "content",
-value => "",
-override => 1,
-rows =>3,
-columns => 80),
br (),
submit(-name=> "choice", -value => "Submit"),
end_form();
}
print header(), start_html(-title=>"To-Do List", -bgcolor => "white"), h2("To-Do List");
my $choice = lc(param ("choice"));
if ($choice eq "") {
display_entry_form();
} elsif ( $choice eq "submit" ) {
insert_item(param("content"));
display_entry_form();
} else {
print p ("Logic error, unknown choice: $choice");
}
The do() method takes 3 arguments: the query, query attributes, and bind data. The undef in your example means that there are no attributes to apply.
See "do()" in DBI on CPAN.
$rows = $dbh->do($statement) or die $dbh->errstr;
$rows = $dbh->do($statement, \%attr) or die $dbh->errstr;
$rows = $dbh->do($statement, \%attr, #bind_values) or die ...
Testing the generation on the fly of config files in ini format with the package Config::Simple generates the desired file but always with the same header section, which includes the name of the Perl package at the beginning of the file. Does Config::Simple have a method to modify this default printing? I would like to replace this name with a new one. Is it possible with some feature in the package?
Here is a toy code:
#!/usr/bin/perl
use strict;
use warnings;
use Config::Simple;
my $cfg = new Config::Simple(
syntax => 'ini'
) or die Config::Simple->error();
$cfg->param("Program.mode", "ALL");
$cfg->param("Program.strategies", "1,2,3,4,5,6,7,8,9,10");
$cfg->param("Data.name_specie", "Homo sapiens");
$cfg->write("test.cfg") or die $cfg->error();
The output:
; Config::Simple 4.58
; Mon Mar 16 12:33:55 2020
[Program]
strategies=1,2,3,4,5,6,7,8,9,10
mode=ALL
[Data]
name_specie=Homo sapiens
Just wanted to replace the ; Config::Simple 4.58 line.
Thanks in advance for your suggestions.
"Simple" modules often aren't simple because they make so many decisions for you. If you don't like those decisions, you are stuck. This particular module hasn't been updated in over a decade and has several architectural issues. If you want INI files, use a different module, such as Config:: IniFiles, Config::Tiny, or Config::INI which are maintained and flexible.
My first thought was to subclass and override the method that adds the header since it was hard-baked into one of the methods. This is onerous because the module uses a combination of "private" subroutines and package variables. I tend to try this first because I don't disturb the original package:
use strict;
use warnings;
use v5.12;
use Config::Simple;
package Local::Config::Simple {
use parent qw(Config::Simple);
# generates a writable string
sub as_string {
my $self = shift;
my $syntax = $self->{_SYNTAX} or die "'_SYNTAX' is not defined";
my $sub_syntax = $self->{_SUB_SYNTAX} || '';
my $currtime = localtime;
my $STRING = undef;
if ( $syntax eq 'ini' ) {
while ( my ($block_name, $key_values) = each %{$self->{_DATA}} ) {
unless ( $sub_syntax eq 'simple-ini' ) {
$STRING .= sprintf("[%s]\n", $block_name);
}
while ( my ($key, $value) = each %{$key_values} ) {
my $values = join (Config::Simple::WRITE_DELIM(), map { Config::Simple::quote_values($_) } #$value);
$STRING .= sprintf("%s=%s\n", $key, $values );
}
$STRING .= "\n";
}
} elsif ( $syntax eq 'http' ) {
while ( my ($key, $value) = each %{$self->{_DATA}} ) {
my $values = join (Config::Simple::WRITE_DELIM(), map { Config::Simple::quote_values($_) } #$value);
$STRING .= sprintf("%s: %s\n", $key, $values);
}
} elsif ( $syntax eq 'simple' ) {
while ( my ($key, $value) = each %{$self->{_DATA}} ) {
my $values = join (Config::Simple::WRITE_DELIM(), map { Config::Simple::quote_values($_) } #$value);
$STRING .= sprintf("%s %s\n", $key, $values);
}
}
$STRING .= "\n";
return $STRING;
}
}
my $cfg = Local::Config::Simple->new(
syntax => 'ini'
) or die Config::Simple->error();
$cfg->param("Program.mode", "ALL");
$cfg->param("Program.strategies", "1,2,3,4,5,6,7,8,9,10");
$cfg->param("Data.name_specie", "Homo sapiens");
$cfg->write("file.ini") or die $cfg->error();
That works and gives the output:
[Data]
name_specie=Homo sapiens
[Program]
mode=ALL
strategies=1,2,3,4,5,6,7,8,9,10
However, it broke several OO ideas, so I find this approach unpleasant. I can do a little bit less work by fixing the original package by redefining the original subroutine. Then the package variables and subroutines still work. Load the original module first then add your redefinitions:
use strict;
use warnings;
use v5.12;
use Config::Simple;
no warnings 'redefine';
package Config::Simple {
# generates a writable string
sub as_string {
my $self = shift;
my $syntax = $self->{_SYNTAX} or die "'_SYNTAX' is not defined";
my $sub_syntax = $self->{_SUB_SYNTAX} || '';
my $currtime = localtime;
my $STRING = undef;
if ( $syntax eq 'ini' ) {
while ( my ($block_name, $key_values) = each %{$self->{_DATA}} ) {
unless ( $sub_syntax eq 'simple-ini' ) {
$STRING .= sprintf("[%s]\n", $block_name);
}
while ( my ($key, $value) = each %{$key_values} ) {
my $values = join (WRITE_DELIM, map { quote_values($_) } #$value);
$STRING .= sprintf("%s=%s\n", $key, $values );
}
$STRING .= "\n";
}
} elsif ( $syntax eq 'http' ) {
while ( my ($key, $value) = each %{$self->{_DATA}} ) {
my $values = join (WRITE_DELIM, map { quote_values($_) } #$value);
$STRING .= sprintf("%s: %s\n", $key, $values);
}
} elsif ( $syntax eq 'simple' ) {
while ( my ($key, $value) = each %{$self->{_DATA}} ) {
my $values = join (WRITE_DELIM, map { quote_values($_) } #$value);
$STRING .= sprintf("%s %s\n", $key, $values);
}
}
$STRING .= "\n";
return $STRING;
}
}
my $cfg = Config::Simple->new(
syntax => 'ini'
) or die Config::Simple->error();
$cfg->param("Program.mode", "ALL");
$cfg->param("Program.strategies", "1,2,3,4,5,6,7,8,9,10");
$cfg->param("Data.name_specie", "Homo sapiens");
$cfg->write("file.ini") or die $cfg->error();
I write quite a bit about this in Effective Perl Programming as a way to deal with legacy code.
As a side note, you asked if there was some method in the module. You could have simply looked at the source to see what was happening and what was available. You would have seen that the header was hard-coded into as_string.
I'm beginner in Perl programming, and I want to learn about OOP in Perl. I develop a simple application about sql creator, but I have a problem. I think the problem is passing multiple parameters.
package SqlCreator;
use warnings;
use strict;
sub new {
my ($class, %args) = #_;
return bless \%args, $class;
}
sub editArray {
my (#array) = #_;
my $text = '';
my $arraySize = scalar #array;
for(my $i = 1; $i<$arraySize; $i++) {
$text .= "'" . $array[$i] . "'". ",";
}
my $length = length $text;
$text = substr($text, 0, $length-1);
return $text;
}
sub createInsertColumn {
my (#column ) = #_;
my $sql = '';
$sql .= ' (';
$sql .= editArray(#column);
$sql .= ')';
return $sql;
}
sub createInsertValue{
my (#value) = #_;
my $sql = '';
$sql .= ' VALUES (';
$sql .= editArray(#value);
$sql .= ')';
return $sql;
}
sub createInsertSql{
my ($table, #column, #value) = #_;
my $sql = 'INSERT INTO ' . $table;
$sql .= createInsertColumn(#column);
$sql .= createInsertValue(#value);
return $sql;
}
1;
use warnings;
use strict;
use SqlCreator;
my $object = SqlCreator->new;
my #column = ('name', 'gender', 'age');
my #value = ('Mehmed Fatih Temiz', 'male', 28);
my $sql = $object->createInsertSql('person', #column, #value);
print $sql;
This is my sample code. If you solve the problem, please help me.
You can't pass arrays to subs, only scalars. When you do
my $sql = $object->createInsertSql(
'person', #column, #value );
you are passing the following to the method:
$object, 'person', $column[0], $column[1], ..., $value[0], $value[1], ...
Inside the method, you have
my ( $table, #column, #value ) = #_;
First of all, you forgot to account for the invocant. There should be a leading $self parameter.
Secondly, there's no way to know how many of the scalars to add to #column and how many to add to #value, so all but the first are added to #column.
This means you were effectively doing the following:
my $table = $object;
my #column = ( 'person', $column[0], $column[1], ..., $value[0], $value[1], ... );
my #value = ();
If you want to pass an array, pass a reference to it instead.
my $sql = $object->createInsertSql( 'person', \#column, \#value );
The sub becomes
sub createInsertSql {
my ( $self, $table, $column, $value ) = #_;
my $sql = 'INSERT INTO '.$table; # XXX Injection bug
$sql .= createInsertColumn(#$column);
$sql .= createInsertValue(#$value);
return $sql;
}
By the way, your code is full of injection errors. You aren't properly converting text into SQL identifiers and SQL string literals.
Fixed:
package SqlCreator;
use warnings;
use strict;
sub new {
my ( $class, %args ) = #_;
return bless \%args, $class;
}
sub createInsertSql{
my ( $self, $table, $cols, $vals ) = #_;
my $dbh = $self->{dbh};
return sprintf(
'INSERT INTO %s ( %s ) VALUES ( %s )'
$dbh->quote_identifier($table),
( join ', ', map { $dbh->quote_identifier($_) } #$cols ),
( join ', ', map { $dbh->quote($_) } #$vals ),
);
}
1;
use warnings;
use strict;
use DBI qw( );
use SqlCreator qw( );
my $dbh = DBI->connect(...);
my $sql_creator = SqlCreator->new( dbh => $dbh );
my #cols = ( 'name', 'gender', 'age' );
my #vals = ( 'Mehmed Fatih Temiz', 'male', 28 );
my $sql = $sql_creator->createInsertSql( 'person' , \#cols , \#vals );
print "$sql\n";
#!/usr/bin/perl -w
use WWW::LinkedIn;
use CGI; # load CGI routines
use CGI::Session;
$q = CGI->new; # create new CGI object
print $q->header, # create the HTTP header
$q->start_html('hello world'), # start the HTML
$q->h1('hello world'), # level 1 header
$q->end_html; # end the HTML
my $consumer_key = 'xxxxxxx';
my $consumer_secret = 'xxxxxxxxx';
my $li = WWW::LinkedIn->new(
consumer_key => $consumer_key,
consumer_secret => $consumer_secret,
);
if ( length( $ENV{'QUERY_STRING'} ) > 0 ) {
$buffer = $ENV{'QUERY_STRING'};
#pairs = split( /&/, $buffer );
foreach $pair (#pairs) {
( $name, $value ) = split( /=/, $pair );
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$in{$name} = $value;
}
$sid = $q->cookie('CGISESSID') || $q->param('CGISESSID') || undef;
$session = new CGI::Session( undef, $sid, { Directory => '/tmp' } );
my $access_token = $li->get_access_token(
verifier => $in{'oauth_verifier'},
request_token => $session->param("request_token"),
request_token_secret => $session->param("request_token_secret"),
);
undef($session);
my $profile_xml = $li->request(
request_url =>
'http://api.linkedin.com/v1/people/~:(id,first-name,last-name,positions,industry,distance)',
access_token => $access_token->{token},
access_token_secret => $access_token->{secret},
);
print $profile_xml;
}
The output is printing in single line. I want to print that is separate line.
OUTPUT
aAVGFD34 jj DD 456456 2003 6 true ara systems Technology and Services Technology and Services 0
How can i get the each column value from the profile_xml variable?
id avsdff
first name jj
lastname dd
Simply use Data::Dumper and XML::Simple.
use Data::Dumper;
use XML::Simple; #you may want to install a specific package from your distribution
{...}
my $hash_ref = SimpeXML::XMLin($profile_xml);
print Dumper($hash_ref);
I do not know if you would like more beautifully output.
try just to make simple print out from your hash reference
foreach $key (keys %{$profile_xml}) {
print "$key $profile_xml->{$key}\n";
}
Here i am going the show the way to parse the data and print in the individual lines.
my $parser = XML::Parser->new( Style => 'Tree' );
my $tree = $parser->parse( $profile_xml );
#print Dumper( $tree ); you can use this see the data displayed in the tree formatted
my $UID = $tree->[1]->[4]->[2],"\n";
print "User ID:$UID";
print"</br>";
my $FirstName = $tree->[1]->[8]->[2],"\n";
print "First Name:$FirstName";
print"</br>";
For sample i have showed for UID and FirstName. And this is working fine.
How to get the value of a parameter code using URI::URL Perl module?
From this link:
http://www.someaddress.com/index.html?test=value&code=INT_12345
It can be done using URI::URL or URI (I know the first one is kind of obsolete). Thanks in advance.
Create a URI object and use the query_form method to get the key/value pairs for the query. If you know that the code parameter is only specified once, you can do it like this:
my $uri = URI->new("http://www.someaddress.com/index.html?test=value&code=INT_12345");
my %query = $uri->query_form;
print $query{code};
Alternatively you can use URI::QueryParam whichs adds soem aditional methods to the URI object:
my $uri = URI->new("http://www.someaddress.com/index.html?test=value&code=INT_12345");
print $uri->query_param("code");
use URI;
my $uri = URI->new("http://someaddr.com/index.html?test=FIRST&test=SECOND&code=INT_12345");
my %query = $uri->query_form;
use Data::Dumper;
print Dumper \%query;
We can see:
$VAR1 = {
'test' => 'SECOND',
'code' => 'INT_12345'
};
Unfortunately, this result is wrong.
There is possible solution:
use URI::Escape;
sub parse_query {
my ( $query, $params ) = #_;
$params ||= {};
foreach $var ( split( /&/, $query ) ){
my ( $k, $v ) = split( /=/, $var );
$k = uri_unescape $k;
$v = uri_unescape $v;
if( exists $params->{$k} ) {
if( 'ARRAY' eq ref $params->{$k} ) {
push #{ $params->{$k} }, $v;
} else {
$params->{$k} = [ $params->{$k}, $v ];
}
} else {
$params->{$k} = $v;
}
}
return $params;
}