Perl: Problems with WWW:Mechanize and a form - perl

i am trying to write a script that will navigate through a soccer website to the player of my choice and scrape their info for me. I have the scraping part working by just hard coding an individual player's page in, but trying to implement the navigation is giving me some problems. The website in question is http://www.soccerbase.com.
I have to fill in a form present at the top of the page with the player's name, then submit it for the search. I have tried it two different ways(commenting out one of them) based on info i found online but to no avail. I am an absolute novice when it comes to Perl so any help would be greatly appreciated! Thanks in advance. here is my code:
#!/usr/bin/perl
use strict;
require WWW::Mechanize;
require HTML::TokeParser;
my $player = 'Luis Antonio Valencia';
#die "Must provide a player's name" unless $player ne 1;
my $agent = WWW::Mechanize->new();
$agent->get('http://www.soccerbase.com/players/home.sd');
$agent->form_name('headSearch');
$agent->set_fields('searchTeamField', $player);
$agent->click_button(name=>"Search");
#$agent->submit_form(
# form_number => 1,
# fields => { => 'Luis Antonio Valencia', }
# );
my $stream = HTML::TokeParser->new(\$agent->{content});
my $player_name;
$stream->get_tag("strong");
$player_name = $stream->get_trimmed_text("/strong");
print "\n", "Player Name: ", $player_name, "\n";

It's a bit tricky because the form action plays switcharoo with Javascript, but HTML::Form is able to handle that perfectly fine:
#!/usr/bin/env perl
use WWW::Mechanize qw();
use URI qw();
my $player = 'Luis Antonio Valencia';
my $agent = WWW::Mechanize->new;
$agent->get('http://www.soccerbase.com/players/home.sd');
my $form = $agent->form_id('headSearch');
{
my $search_uri = $agent->uri;
$search_uri->path('/players/search.sd');
$form->action($search_uri);
# requires absolute URI
}
$agent->submit_form(
fields => {
search => $player,
type => 'player',
}
);

Easier way is to look at the HTTP request it makes, for instance:
http://www.soccerbase.com/players/search.sd?search=kkkk&type=player
'kkkk' is the player name, use LWP::UserAgent to make that request, and it will give you the result, change the 'kkk' to the name of the player you are looking to get info for, and that will do the job, using Mech for that is an overkill, if you ask me, make sure that if the player name has spaces,etc encode it.

It looks like the form elements do not have name attributes and I am assuming the query string is formed by some other means by translating the id attributes to yield:
http://www.soccerbase.com/players/search.sd?search=Luis+Antonio+Valencia&type=player
You'd think the following would work, but it doesn't suggesting that there is some other JavaScript goodness(!) happening behind the scenes.
#!/usr/bin/env perl
use strict;
use warnings;
use HTML::TableExtract;
use LWP::Simple qw(get);
use URI;
my $player = 'Luis Antonio Valencia';
my $uri = URI->new('http://www.soccerbase.com/players/home.sd');
$uri->query_form(
search => $player,
type => 'player',
);
my $content = get "$uri";
die "Failed to get '$uri'\n" unless defined $content;
my $te = HTML::TableExtract->new(
attribs => { class => 'clubInfo' },
);
$te->parse($content);
die unless $te->tables;
my ($table) = $te->tables;
my ($row) = $table->rows;
print $row->[1], "\n";

Related

HTML::TableExtract an HTTPS site

I've created a perl script to use HTML::TableExtract to scrape data from tables on a site.
It works great to dump out table data for unsecured sites (i.e. HTTP site), but when I try HTTPS sites, it doesn't work (the tables_report line just prints blank.. it should print a bunch of table data).
However, if I take the content of that HTTPS page, and save it to an html file and then post it on an unsecured HTTP site (and change my content to point to this HTTP page), this script works as expected.
Anyone know how I can get this to work over HTTPS?
#!/usr/bin/perl
use lib qw( ..);
use HTML::TableExtract;
use LWP::Simple;
use Data::Dumper;
# DOESN'T work:
my $content = get("https://datatables.net/");
# DOES work:
# my $content = get("http://www.w3schools.com/html/html_tables.asp");
my $te = HTML::TableExtract->new();
$te->parse($content);
print $te->tables_report(show_content=>1);
print "\n";
print "End\n";
The sites mentioned above for $content are just examples.. these aren't really the sites I'm extracting, but they work just like the site I'm really trying to scrape.
One option I guess is for me to use perl to download the page locally first and extract from there, but I'd rather not, if there's an easier way to do this (anyone that helps, please don't spend any crazy amount of time coming up with a complicated solution!).
The problem is related to the user agent that LWP::Simple uses, which is stopped at that site. Use LWP::UserAgent and set an allowed user agent, like this:
use strict;
use warnings;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $url = 'https://datatables.net/';
$ua->agent("Mozilla/5.0"); # set user agent
my $res = $ua->get($url); # send request
# check the outcome
if ($res->is_success) {
# ok -> I simply print the content in this example, you should parse it
print $res->decoded_content;
}
else {
# ko
print "Error: ", $res->status_line, "\n";
}
This is because datatables.net is blocking LWP::Simple requests. You can confirm this by using below code:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
print is_success(getprint("https://datatables.net/"));
Output:
$ perl test.pl
403 Forbidden <URL:https://datatables.net/>
You could try using LWP::RobotUA. Below code works fine for me.
#!/usr/bin/perl
use strict;
use warnings;
use LWP::RobotUA;
use HTML::TableExtract;
my $ua = LWP::RobotUA->new( 'bot_chankey/1.1', 'chankeypathak#stackoverflow.com' );
$ua->delay(5/60); # 5 second delay between requests
my $response = $ua->get('https://datatables.net/');
if ( $response->is_success ) {
my $te = HTML::TableExtract->new();
$te->parse($response->content);
print $te->tables_report(show_content=>1);
}
else {
die $response->status_line;
}
In the end, a combination of Miguel and Chankey's responses provided my solution. Miguel's made up most of my code, so I selected that as the answer, but here is my "final" code (got a lot more to do, but this is all I couldn't figure out.. the rest should be no problem).
I couldn't quite get either mentioned by Miguel/Chankey to work, but they got me 99% of the way.. then I just had to figure out how to get around the error "certificate verify failed". I found that answer with Miguel's method right away, so in the end, I mostly used his code, but both responses were great!
#!/usr/bin/perl
use lib qw( ..);
use strict;
use warnings;
use LWP::UserAgent;
use HTML::TableExtract;
use LWP::RobotUA;
use Data::Dumper;
my $ua = LWP::UserAgent->new(
ssl_opts => { SSL_verify_mode => 'SSL_VERIFY_PEER' },
);
my $url = 'https://WebsiteIUsedWasSomethingElse.com';
$ua->agent("Mozilla/5.0"); # set user agent
my $res = $ua->get($url); # send request
# check the outcome
if ($res->is_success)
{
my $te = HTML::TableExtract->new();
$te->parse($res->content);
print $te->tables_report(show_content=>1);
}
else {
# ko
print "Error: ", $res->status_line, "\n";
}
my $url = "https://ohsesfire01.summit.network/reports/slices";
my $user = 'xxxxxx';
my $pass = 'xxxxxx';
my $ua = new LWP::UserAgent;
my $request = new HTTP::Request GET=> $url;
# authenticate
$request->authorization_basic($user, $pass);
my $page = $ua->request($request);

WWW:Facebook::API used in perl

I am getting www:Facebook:api in perl and CPAN
error while using the Use of uninitialized value within %field in hash element at /usr/share/perl5/WWW/Facebook/API/Auth.pm line 62.
i defined all keys
#!/usr/bin/perl -w
use strict;
use warnings;
use CGI;
use WWW::Facebook::API;
use WWW::Facebook::API::Auth;
use HTTP::Request;
use LWP;
my $TMP = $ENV{HOME}.'/tmp';
my $facebook_api = '--------';
my $facebook_secret = '-------';
my $facebook_clientid = '--------';
my $gmail_user = '-------';
my $gmail_password = '--------';
my $client = WWW::Facebook::API->new(
desktop => 1,
api_version => '1.0',
api_key => $facebook_api,
secret => $facebook_secret,
throw_errors => 1,
);
$client->app_id($facebook_clientid);
local $SIG{INT} = sub {
print "Logging out of Facebookn";
my $r = $client->auth->logout;
exit(1);
};
my $token = $client->auth->create_token;
print "$token \n";
$client->auth->get_session($token);
print "$client \n";
WWW::Facebook::API doesn't look like it's been updated for a while. Line 62 of that file is:
$self->base->{ $field{$key} } = $resp->{$key};
The undefined value is the $field{$key} part. The %fieldhash is a hard-coded mapping between the names of Facebook API's known fields (i.e. the fields in the data Facebook returns to you) and the names which the module wants them to be called. It seems that Facebook has added some additional fields to its data, and the module has not been updated to deal with them.
Ultimately, this is just a warning; you can just ignore it if you like. If you want your script's output to be a bit tidier, you could change that line to:
$self->base->{ $field{$key} } = $resp->{$key} if defined $field{$key};

Retrieve a value from object in perl

I am getting
$VAR1 = bless( \*{'Fh::fh00001Screenshot.png'}, 'Fh' );
in a variable. But I need to retrieve fh00001Screenshot.png from it. How can I get it?
The Fh package is used internally by the CGI module to handle temporary files used for building multipart data. You shouldn't be using it directly.
Check carefully to make sure there is no better way before using this code which comes from the CGI code for Fh::asString
(my $name = $$VAR1) =~ s/^\*(\w+::fh\d{5})+//;
print $name;
output
Screenshot.png
Update
Rather than picking bits out of the CGI code, it looks like this package - which should really be a private one - is accessible from calling code. Use just $var->asString instead, like this
use strict;
use warnings;
use CGI;
my $var = do {
no strict 'refs';
my $var = bless( \*{'Fh::fh00001Screenshot.png'}, 'Fh' );
};
print $var->asString;

using Perl to scrape a website

I am interested in writing a perl script that goes to the following link and extracts the number 1975: https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219
That website is the amount of white men born in the year 1923 who live in San Diego County, California in 1940. I am trying to do this in a loop structure to generalize over multiple counties and birth years.
In the file, locations.txt, I put the list of counties, such as San Diego County.
The current code runs, but instead of the # 1975, it displays unknown. The number 1975 should be in $val\n.
I would very much appreciate any help!
#!/usr/bin/perl
use strict;
use LWP::Simple;
open(L, "locations26.txt");
my $url = 'https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3A%22California%22%20%2Bevent_place_level_2%3A%22%LOCATION%%22%20%2Bbirth_year%3A%YEAR%-%YEAR%~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219';
open(O, ">out26.txt");
my $oldh = select(O);
$| = 1;
select($oldh);
while (my $location = <L>) {
chomp($location);
$location =~ s/ /+/g;
foreach my $year (1923..1923) {
my $u = $url;
$u =~ s/%LOCATION%/$location/;
$u =~ s/%YEAR%/$year/;
#print "$u\n";
my $content = get($u);
my $val = 'unknown';
if ($content =~ / of .strong.([0-9,]+)..strong. /) {
$val = $1;
}
$val =~ s/,//g;
$location =~ s/\+/ /g;
print "'$location',$year,$val\n";
print O "'$location',$year,$val\n";
}
}
Update: API is not a viable solution. I have been in contact with the site developer. The API does not apply to that part of the webpage. Hence, any solution pertaining to JSON will not be applicbale.
It would appear that your data is generated by Javascript and thus LWP cannot help you. That said, it seems that the site you are interested in has a developer API: https://familysearch.org/developers/
I recommend using Mojo::URL to construct your query and either Mojo::DOM or Mojo::JSON to parse XML or JSON results respectively. Of course other modules will work too, but these tools are very nicely integrated and let you get started quickly.
You could use WWW::Mechanize::Firefox to process any site that could be loaded by Firefox.
http://metacpan.org/pod/WWW::Mechanize::Firefox::Examples
You have to install the Mozrepl plugin and you will be able to process the web page contant via this module. Basically you will "remotly control" the browser.
Here is an example (maybe working)
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new(
activate => 1, # bring the tab to the foreground
);
$mech->get('https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219',':content_file' => 'main.html');
my $retries = 10;
while ($retries-- and ! $mech->is_visible( xpath => '//*[#class="form-submit"]' )) {
print "Sleep until we find the thing\n";
sleep 2;
};
die "Timeout" if 0 > $retries;
#fill out the search form
my #forms = $mech->forms();
#<input id="census_bp" name="birth_place" type="text" tabindex="0"/>
#A selector prefixed with '#' must match the id attribute of the input. A selector prefixed with '.' matches the class attribute. A selector prefixed with '^' or with no prefix matches the name attribute.
$mech->field( birth_place => 'value_for_birth_place' );
# Click on the submit
$mech->click({xpath => '//*[#class="form-submit"]'});
If you use your browser's development tools, you can clearly see the JSON request that the page you link to uses to get the data you're looking for.
This program should do what you want. I've added a bunch of comments for readability and explanation, as well as made a few other changes.
use warnings;
use strict;
use LWP::UserAgent;
use JSON;
use CGI qw/escape/;
# Create an LWP User-Agent object for sending HTTP requests.
my $ua = LWP::UserAgent->new;
# Open data files
open(L, 'locations26.txt') or die "Can't open locations: $!";
open(O, '>', 'out26.txt') or die "Can't open output file: $!";
# Enable autoflush on the output file handle
my $oldh = select(O);
$| = 1;
select($oldh);
while (my $location = <L>) {
# This regular expression is like chomp, but removes both Windows and
# *nix line-endings, regardless of the system the script is running on.
$location =~ s/[\r\n]//g;
foreach my $year (1923..1923) {
# If you need to add quotes around the location, use "\"$location\"".
my %args = (LOCATION => $location, YEAR => $year);
my $url = 'https://familysearch.org/proxy?uri=https%3A%2F%2Ffamilysearch.org%2Fsearch%2Frecords%3Fcount%3D20%26query%3D%252Bevent_place_level_1%253ACalifornia%2520%252Bevent_place_level_2%253A^LOCATION^%2520%252Bbirth_year%253A^YEAR^-^YEAR^~%2520%252Bgender%253AM%2520%252Brace%253AWhite%26collection_id%3D2000219';
# Note that values need to be doubly-escaped because of the
# weird way their website is set up (the "/proxy" URL we're
# requesting is subsequently loading some *other* URL which
# is provided to "/proxy" as a URL-encoded URL).
#
# This regular expression replaces any ^WHATEVER^ in the URL
# with the double-URL-encoded value of WHATEVER in %args.
# The /e flag causes the replacement to be evaluated as Perl
# code. This way I can look data up in a hash and do URL-encoding
# as part of the regular expression without an extra step.
$url =~ s/\^([A-Z]+)\^/escape(escape($args{$1}))/ge;
#print "$url\n";
# Create an HTTP request object for this URL.
my $request = HTTP::Request->new(GET => $url);
# This HTTP header is required. The server outputs garbage if
# it's not present.
$request->push_header('Content-Type' => 'application/json');
# Send the request and check for an error from the server.
my $response = $ua->request($request);
die "Error ".$response->code if !$response->is_success;
# The response should be JSON.
my $obj = from_json($response->content);
my $str = "$args{LOCATION},$args{YEAR},$obj->{totalHits}\n";
print O $str;
print $str;
}
}
What about this simple script without firefox ? I had investigated the site a bit to understand how it works, and I saw some JSON requests with firebug firefox addon, so I know which URL to query to get the relevant stuff. Here is the code :
use strict; use warnings;
use JSON::XS;
use LWP::UserAgent;
use HTTP::Request;
my $ua = LWP::UserAgent->new();
open my $fh, '<', 'locations2.txt' or die $!;
open my $fh2, '>>', 'out2.txt' or die $!;
# iterate over locations from locations2.txt file
while (my $place = <$fh>) {
# remove line ending
chomp $place;
# iterate over years
foreach my $year (1923..1925) {
# building URL with the variables
my $url = "https://familysearch.org/proxy?uri=https%3A%2F%2Ffamilysearch.org%2Fsearch%2Frecords%3Fcount%3D20%26query%3D%252Bevent_place_level_1%253ACalifornia%2520%252Bevent_place_level_2%253A%2522$place%2522%2520%252Bbirth_year%253A$year-$year~%2520%252Bgender%253AM%2520%252Brace%253AWhite%26collection_id%3D2000219";
my $request = HTTP::Request->new(GET => $url);
# faking referer (where we comes from)
$request->header('Referer', 'https://familysearch.org/search/collection/results');
# setting expected format header for response as JSON
$request->header('content_type', 'application/json');
my $response = $ua->request($request);
if ($response->code == 200) {
# this line convert a JSON to Perl HASH
my $hash = decode_json $response->content;
my $val = $hash->{totalHits};
print $fh2 "year $year, place $place : $val\n";
}
else {
die $response->status_line;
}
}
}
END{ close $fh; close $fh2; }
This seems to do what you need. Instead of waiting for the disappearance of the hourglass it waits - more obviously I think - for the appearance of the text node you're interested in.
use 5.010;
use warnings;
use WWW::Mechanize::Firefox;
STDOUT->autoflush;
my $url = 'https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219';
my $mech = WWW::Mechanize::Firefox->new(tab => qr/FamilySearch\.org/, create => 1, activate => 1);
$mech->autoclose_tab(0);
$mech->get('about:blank');
$mech->get($url);
my $text;
while () {
sleep 1;
$text = $mech->xpath('//p[#class="num-search-results"]/text()', maybe => 1);
last if defined $text;
}
my $results = $text->{nodeValue};
say $results;
if ($results =~ /([\d,]+)\s+results/) {
(my $n = $1) =~ tr/,//d;
say $n;
}
output
1-20 of 1,975 results
1975
Update
This update is with special thanks to #nandhp, who inspired me to look at the underlying data server that produces the data in JSON format.
Rather than making a request via the superfluous https://familysearch.org/proxy this code accesses the server directly at https://familysearch.org/search/records, reencodes the JSON and dumps the required data out of the resulting structure. This has the advantage of both speed (the requests are served about once a second - more than ten times faster than with the equivalent request from the basic web site) and stability (as you note, the site is very flaky - in contrast I have never seen an error using this method).
use strict;
use warnings;
use LWP::UserAgent;
use URI;
use JSON;
use autodie;
STDOUT->autoflush;
open my $fh, '<', 'locations26.txt';
my #locations = <$fh>;
chomp #locations;
open my $outfh, '>', 'out26.txt';
my $ua = LWP::UserAgent->new;
for my $county (#locations[36, 0..2]) {
for my $year (1923 .. 1926) {
my $total = familysearch_info($county, $year);
print STDOUT "$county,$year,$total\n";
print $outfh "$county,$year,$total\n";
}
print "\n";
}
sub familysearch_info {
my ($county, $year) = #_;
my $query = join ' ', (
'+event_place_level_1:California',
sprintf('+event_place_level_2:"%s"', $county),
sprintf('+birth_year:%1$d-%1$d~', $year),
'+gender:M',
'+race:White',
);
my $url = URI->new('https://familysearch.org/search/records');
$url->query_form(
collection_id => 2000219,
count => 20,
query => $query);
my $resp = $ua->get($url, 'Content-Type'=> 'application/json');
my $data = decode_json($resp->decoded_content);
return $data->{totalHits};
}
output
San Diego,1923,1975
San Diego,1924,2004
San Diego,1925,1871
San Diego,1926,1908
Alameda,1923,3577
Alameda,1924,3617
Alameda,1925,3567
Alameda,1926,3464
Alpine,1923,1
Alpine,1924,2
Alpine,1925,0
Alpine,1926,1
Amador,1923,222
Amador,1924,248
Amador,1925,134
Amador,1926,67
I do not know how to post revised code from the solution above.
This code does not (yet) compile correctly. However, I have made some essential update to definitely head in that direction.
I would very much appreciate help on this updated code. I do not know how to post this code and this follow up such that it appease the lords who run this sight.
It get stuck at the sleep line. Any advice on how to proceed past it would be much appreciated!
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new(
activate => 1, # bring the tab to the foreground
);
$mech->get('https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219',':content_file' => 'main.html', synchronize => 0);
my $retries = 10;
while ($retries-- and $mech->is_visible( xpath => '//*[#id="hourglass"]' )) {
print "Sleep until we find the thing\n";
sleep 2;
};
die "Timeout while waiting for application" if 0 > $retries;
# Now the hourglass is not visible anymore
#fill out the search form
my #forms = $mech->forms();
#<input id="census_bp" name="birth_place" type="text" tabindex="0"/>
#A selector prefixed with '#' must match the id attribute of the input. A selector prefixed with '.' matches the class attribute. A selector prefixed with '^' or with no prefix matches the name attribute.
$mech->field( birth_place => 'value_for_birth_place' );
# Click on the submit
$mech->click({xpath => '//*[#class="form-submit"]'});
You should set the current form before accessing a field:
"Given the name of a field, set its value to the value specified. This applies to the current form (as set by the "form_name()" or "form_number()" method or defaulting to the first form on the page)."
$mech->form_name( 'census-search' );
$mech->field( birth_place => 'value_for_birth_place' );
Sorry, I am not able too try this code out and thanks for open a question for a new question.

How do I encode HTTP GET query strings in Perl?

This question is somewhat related to What’s the simplest way to make a HTTP GET request in Perl?.
Before making the request via LWP::Simple I have a hash of query string components that I need to serialize/escape. What's the best way to encode the query string? It should take into account spaces and all the characters that need to be escaped in valid URIs. I figure it's probably in an existing package, but I'm not sure how to go about finding it.
use LWP::Simple;
my $base_uri = 'http://example.com/rest_api/';
my %query_hash = (spam => 'eggs', foo => 'bar baz');
my $query_string = urlencode(query_hash); # Part in question.
my $query_uri = "$base_uri?$query_string";
# http://example.com/rest_api/?spam=eggs&foo=bar+baz
$contents = get($query_uri);
URI::Escape is probably the most direct answer, as other have given, but I would recommend using a URI object for the entire thing. URI automatically escapes the GET parameters for you (using URI::Escape).
my $uri = URI->new( 'http://example.com' );
$uri->query_form(foo => '1 2', bar => 2);
print $uri; ## http://example.com?foo=1+2&bar=2
As an added bonus, LWP::Simple's get function will take a URI object as it's argument instead of a string.
URI::Escape does what you want.
use URI::Escape;
sub escape_hash {
my %hash = #_;
my #pairs;
for my $key (keys %hash) {
push #pairs, join "=", map { uri_escape($_) } $key, $hash{$key};
}
return join "&", #pairs;
}
URI is far simpler than URI::Escape for this. The method query_form() accepts a hash or a hashref:
use URI;
my $full_url = URI->new('http://example.com');
$full_url->query_form({"id" => 27, "order" => "my key"});
print "$full_url\n"; # http://example.com?id=27&order=my+key
Use the module URI to build the URL with the query parameters:
use LWP::Simple;
use URI;
my $uri_object = URI->new('http://example.com/rest_api/');
$uri_object->query_form(spam => 'eggs', foo => 'bar baz');
$contents = get("$uri_object");
I found this solution here.
Use LWP::UserAgent instead:
use strict;
use warnings;
use LWP::UserAgent;
my %query_hash = (spam => 'eggs', foo => 'bar baz');
my $ua = LWP::UserAgent->new();
my $resp = $ua->get("http://www.foobar.com", %query_hash);
print $resp->content;
It takes care of the encoding for you.
If you want a more generic encoding solution, see HTML::Entities.
EDIT: URI::Escape is a better choice.
URI::Escape is the module you are probably thinking of.