This question is somewhat related to What’s the simplest way to make a HTTP GET request in Perl?.
Before making the request via LWP::Simple I have a hash of query string components that I need to serialize/escape. What's the best way to encode the query string? It should take into account spaces and all the characters that need to be escaped in valid URIs. I figure it's probably in an existing package, but I'm not sure how to go about finding it.
use LWP::Simple;
my $base_uri = 'http://example.com/rest_api/';
my %query_hash = (spam => 'eggs', foo => 'bar baz');
my $query_string = urlencode(query_hash); # Part in question.
my $query_uri = "$base_uri?$query_string";
# http://example.com/rest_api/?spam=eggs&foo=bar+baz
$contents = get($query_uri);
URI::Escape is probably the most direct answer, as other have given, but I would recommend using a URI object for the entire thing. URI automatically escapes the GET parameters for you (using URI::Escape).
my $uri = URI->new( 'http://example.com' );
$uri->query_form(foo => '1 2', bar => 2);
print $uri; ## http://example.com?foo=1+2&bar=2
As an added bonus, LWP::Simple's get function will take a URI object as it's argument instead of a string.
URI::Escape does what you want.
use URI::Escape;
sub escape_hash {
my %hash = #_;
my #pairs;
for my $key (keys %hash) {
push #pairs, join "=", map { uri_escape($_) } $key, $hash{$key};
}
return join "&", #pairs;
}
URI is far simpler than URI::Escape for this. The method query_form() accepts a hash or a hashref:
use URI;
my $full_url = URI->new('http://example.com');
$full_url->query_form({"id" => 27, "order" => "my key"});
print "$full_url\n"; # http://example.com?id=27&order=my+key
Use the module URI to build the URL with the query parameters:
use LWP::Simple;
use URI;
my $uri_object = URI->new('http://example.com/rest_api/');
$uri_object->query_form(spam => 'eggs', foo => 'bar baz');
$contents = get("$uri_object");
I found this solution here.
Use LWP::UserAgent instead:
use strict;
use warnings;
use LWP::UserAgent;
my %query_hash = (spam => 'eggs', foo => 'bar baz');
my $ua = LWP::UserAgent->new();
my $resp = $ua->get("http://www.foobar.com", %query_hash);
print $resp->content;
It takes care of the encoding for you.
If you want a more generic encoding solution, see HTML::Entities.
EDIT: URI::Escape is a better choice.
URI::Escape is the module you are probably thinking of.
Related
I'm writing my first perl script for the requirement
generate HTTP request against a particular web uri in succession using different URL scheme patterns
use HTTP::Request::Generator 'generate_requests';
use URI;
use HTTP::Request::Common;
use strict; # safety net
use warnings; # safety ne
use Test::LWP::UserAgent 'send_request';
use LWP::UserAgent 'send_request';
use Test::More;
use URI;
use HTTP::Request::Common;
use LWP::UserAgent;
my $g = generate_requests(
method => 'POST',
host => ['example.com','www.example.com'],
pattern => 'https://example.com/{bar,foo,gallery}/[00..99].html',
wrap => sub {
my( $req ) = #_;
# Fix up some values
$req->{headers}->{'Content-Length'} = 666;
},
);
while( my $r = $g->()) {
send_request( $r );
};
I'm using atom editor and activeperl on windows 10, I get following error from running above code.
Undefined subroutine &main::send_request called at C:\Users\ADMINI~1\AppData\Local\Temp\atom_script_tempfiles\0ac821e0-0886-11eb-9588-291dbc37d883 line 57.
I have already installed all necessary modules and lib but i think its unable to refer the method send_request. Pls assist.
NOTE
I have replaced real values in variable for privacy reasons.
UPDATE
I plan to use following module
pattern => 'https://example.{com,org,net}/page_[00..99].html', from
https://metacpan.org/pod/HTTP::Request::Generator.
LWP::UserAgent is an object-oriented module. It doesn't export functions. You want to call send_request like this:
my $ua = 'LWP::UserAgent'->new;
while ( my $r = $g->() ) {
$ua->send_request( $r );
}
That said, send_request is an undocumented internal method. I think it is probably more intended for people who are subclassing LWP::UserAgent. You probably want the request method instead.
my $ua = 'LWP::UserAgent'->new;
while ( my $r = $g->() ) {
my $response = $ua->request( $r );
}
Full code:
use strict;
use warnings;
use HTTP::Request::Generator 'generate_requests';
use LWP::UserAgent;
my $ua = 'LWP::UserAgent'->new;
my $gen = generate_requests(
method => 'POST',
host => [ 'example.com', 'www.example.com' ],
pattern => 'https://example.com/{bar,foo,gallery}/[00..99].html',
wrap => sub {
my ( $req ) = #_;
# Fix up some values
$req->{'headers'}{'Content-Length'} = 666;
},
);
while ( my $req = $gen->() ) {
my $response = $ua->request( $req );
# Do something with $response here?
}
I'm failing to get a node by its id.
The code is straight forward and should be self-explaining.
#!/usr/bin/perl
use Encode;
use utf8;
use LWP::UserAgent;
use URI::URL;
use Data::Dumper;
use HTML::TreeBuilder::XPath;
my $url = 'https://www.airbnb.com/rooms/1976460';
my $browser = LWP::UserAgent->new;
my $resp = $browser->get( $url, 'User-Agent' => 'Mozilla\/5.0' );
if ($resp->is_success) {
my $base = $resp->base || '';
print "-> base URL: $base\n";
my $data = $resp->decoded_content;
my $tree= HTML::TreeBuilder::XPath->new;
$tree->parse_content( $resp->decoded_content() );
binmode STDOUT, ":encoding(UTF-8)";
my $price_day = $tree->find('.//*[#id="price_amount"]/');
print Dumper($price_day);
$tree->delete();
}
The code above prints:
-> base URL: https://www.airbnb.com/rooms/1976460
$VAR1 = undef;
How can I select a node by its ID?
Thanks in advance.
Take that / off the end of that XPath.
.//*[#id="price_amount"]
should do. As it is, it's not valid XPath.
There is a trailing slash in your XPath, that you need to remove
my $price_day = $tree->find('.//*[#id="price_amount"]');
However, from my own testing, I believe that HTML::TreeBuilder::XPath is also having trouble parsing that specific URL. Perhaps because of the conditional comments?
As an alternative approach, I would recommend using Mojo::UserAgent and Mojo::DOM instead.
The following uses the css selector div#price_amount to easily find your desired element and print it out.
use strict;
use warnings;
use Mojo::UserAgent;
my $url = 'https://www.airbnb.com/rooms/1976460';
my $dom = Mojo::UserAgent->new->get($url)->res->dom;
my $price_day = $dom->at(q{div#price_amount})->all_text;
print $price_day, "\n";
Outputs:
$285
Note, there is a helpful 8 minute introductory video to this set of modules at Mojocast Episode 5.
I am getting www:Facebook:api in perl and CPAN
error while using the Use of uninitialized value within %field in hash element at /usr/share/perl5/WWW/Facebook/API/Auth.pm line 62.
i defined all keys
#!/usr/bin/perl -w
use strict;
use warnings;
use CGI;
use WWW::Facebook::API;
use WWW::Facebook::API::Auth;
use HTTP::Request;
use LWP;
my $TMP = $ENV{HOME}.'/tmp';
my $facebook_api = '--------';
my $facebook_secret = '-------';
my $facebook_clientid = '--------';
my $gmail_user = '-------';
my $gmail_password = '--------';
my $client = WWW::Facebook::API->new(
desktop => 1,
api_version => '1.0',
api_key => $facebook_api,
secret => $facebook_secret,
throw_errors => 1,
);
$client->app_id($facebook_clientid);
local $SIG{INT} = sub {
print "Logging out of Facebookn";
my $r = $client->auth->logout;
exit(1);
};
my $token = $client->auth->create_token;
print "$token \n";
$client->auth->get_session($token);
print "$client \n";
WWW::Facebook::API doesn't look like it's been updated for a while. Line 62 of that file is:
$self->base->{ $field{$key} } = $resp->{$key};
The undefined value is the $field{$key} part. The %fieldhash is a hard-coded mapping between the names of Facebook API's known fields (i.e. the fields in the data Facebook returns to you) and the names which the module wants them to be called. It seems that Facebook has added some additional fields to its data, and the module has not been updated to deal with them.
Ultimately, this is just a warning; you can just ignore it if you like. If you want your script's output to be a bit tidier, you could change that line to:
$self->base->{ $field{$key} } = $resp->{$key} if defined $field{$key};
i am trying to write a script that will navigate through a soccer website to the player of my choice and scrape their info for me. I have the scraping part working by just hard coding an individual player's page in, but trying to implement the navigation is giving me some problems. The website in question is http://www.soccerbase.com.
I have to fill in a form present at the top of the page with the player's name, then submit it for the search. I have tried it two different ways(commenting out one of them) based on info i found online but to no avail. I am an absolute novice when it comes to Perl so any help would be greatly appreciated! Thanks in advance. here is my code:
#!/usr/bin/perl
use strict;
require WWW::Mechanize;
require HTML::TokeParser;
my $player = 'Luis Antonio Valencia';
#die "Must provide a player's name" unless $player ne 1;
my $agent = WWW::Mechanize->new();
$agent->get('http://www.soccerbase.com/players/home.sd');
$agent->form_name('headSearch');
$agent->set_fields('searchTeamField', $player);
$agent->click_button(name=>"Search");
#$agent->submit_form(
# form_number => 1,
# fields => { => 'Luis Antonio Valencia', }
# );
my $stream = HTML::TokeParser->new(\$agent->{content});
my $player_name;
$stream->get_tag("strong");
$player_name = $stream->get_trimmed_text("/strong");
print "\n", "Player Name: ", $player_name, "\n";
It's a bit tricky because the form action plays switcharoo with Javascript, but HTML::Form is able to handle that perfectly fine:
#!/usr/bin/env perl
use WWW::Mechanize qw();
use URI qw();
my $player = 'Luis Antonio Valencia';
my $agent = WWW::Mechanize->new;
$agent->get('http://www.soccerbase.com/players/home.sd');
my $form = $agent->form_id('headSearch');
{
my $search_uri = $agent->uri;
$search_uri->path('/players/search.sd');
$form->action($search_uri);
# requires absolute URI
}
$agent->submit_form(
fields => {
search => $player,
type => 'player',
}
);
Easier way is to look at the HTTP request it makes, for instance:
http://www.soccerbase.com/players/search.sd?search=kkkk&type=player
'kkkk' is the player name, use LWP::UserAgent to make that request, and it will give you the result, change the 'kkk' to the name of the player you are looking to get info for, and that will do the job, using Mech for that is an overkill, if you ask me, make sure that if the player name has spaces,etc encode it.
It looks like the form elements do not have name attributes and I am assuming the query string is formed by some other means by translating the id attributes to yield:
http://www.soccerbase.com/players/search.sd?search=Luis+Antonio+Valencia&type=player
You'd think the following would work, but it doesn't suggesting that there is some other JavaScript goodness(!) happening behind the scenes.
#!/usr/bin/env perl
use strict;
use warnings;
use HTML::TableExtract;
use LWP::Simple qw(get);
use URI;
my $player = 'Luis Antonio Valencia';
my $uri = URI->new('http://www.soccerbase.com/players/home.sd');
$uri->query_form(
search => $player,
type => 'player',
);
my $content = get "$uri";
die "Failed to get '$uri'\n" unless defined $content;
my $te = HTML::TableExtract->new(
attribs => { class => 'clubInfo' },
);
$te->parse($content);
die unless $te->tables;
my ($table) = $te->tables;
my ($row) = $table->rows;
print $row->[1], "\n";
I have the Perl & LWP book, but how do I set the user-agent string?
This is what I've got:
use LWP::UserAgent;
use LWP::Simple; # Used to download files
my $u = URI->new($url);
my $response_u = LWP::UserAgent->new->get($u);
die "Error: ", $response_u->status_line unless $response_u->is_success;
Any suggestions, if I want to use LWP::UserAgent like I do here?
From the LWP cookbook:
use LWP::UserAgent;
$ua = new LWP::UserAgent;
$ua->agent("$0/0.1 " . $ua->agent);
# $ua->agent("Mozilla/8.0") # pretend we are very capable browser
$req = new HTTP::Request 'GET' => 'http://www.sn.no/libwww-perl';
$req->header('Accept' => 'text/html');
# send request
$res = $ua->request($req);
I do appreciate the LWP cookbook solution which mentions the subclassing solution with a passing reference to lwp-request.
a wise perl monk once said: the ole subclassing LWP::UserAgent trick
package AgentP;
use base 'LWP::UserAgent';
sub _agent { "Mozilla/8.0" }
sub get_basic_credentials {
return 'admin', 'password';
}
package main;
use AgentP;
my $agent = AgentP->new;
my $response = $agent->get( 'http://127.0.0.1/hideout.html' );
print $agent->agent();
the entry has been revised with some poor humor, use statement, _agent override, and updated agent print line.
Bonus material for the interested: HTTP basic auth provided with get_basic_credentials override, which is how most people come to find the subclassing solution. _methods are sacred or something; but it does scratch an itch doesn't it?