I have some URLs like http://anytext.a.abs.com
In these, 'anytext' is the data that is dynamic. Rest of the URL will remain same in every case.
I'm using the following code:
$url = "http://anytext.a.abs.com";
my $request = new HTTP::Request 'GET', $url;
my $response = $ua->request($request);
if ($response->is_success)
{
function......;
}
Now, how can I parse a URL that has dynamic data in it?
Not sure but is this close to what you're after?:
for my $host qw(anytext someothertext andanother) {
my $url = "http://$host.a.abs.com";
my $request = new HTTP::Request 'GET', $url;
my $response = $ua->request($request);
if ($response->is_success)
{
function......;
}
}
Something like this maybe?
Otherwise, you can use the URI class to do url manipulation.
my $protocol = 'http://'
my $url_end = '.a.abs.com';
$url = $protocol . "anytext" . $url_end;
my $request = new HTTP::Request 'GET', $url;
my $response = $ua->request($request);
if ($response->is_success)
{
function......;
}
I think this is probably enough:
# The regex specifies a string preceded by two slashes and all non-dots
my ( $host_name ) = $url =~ m{//([^.]+)};
And if you want to change it:
$url =~ s|^http://\K([^.]+)|$host_name_I_want|;
Or even:
substr( $url, index( $url, $host_name ), length( $host_name ), $host_name_I_want );
This will expand the segment sufficiently to accommodate $host_name_I_want.
Well, like you would parse any other data: Use the information you have about the structure.
You have a protocol part, followed by "colon slash slash", then the host followed by optional "colon port number" and an optional path on the host.
So ... build a little parser that extracts the information you are after.
And frankly, if you are only hunting for "what exactely is 'anytext' here?", a RegEx of this form should help (untested; use as guidance):
$url =~ m/http\:\/\/(.*).a.abs.com/;
$subdomain = $1;
$do_something('with', $subdomain);
Sorry if I grossly misunderstood the problem at hand. Please explain what you mean with 'how can I parse a URL that has dynamic data in it?' in that case :)
Related
Here is the error message I am receiving when I try to run the following Perl code. Any ideas what causing this? It appears it does not like the API=Verify, but it was the only name for the API which returns a 9 digit zip code I could find.
80040B1AAPI Authorization failure. is not a valid API name for this
protocol.USPSCOM::DoAuth
# Perl subroutine for POST Request
#########
sub FindTracking() {
$saddress="60 Passional Way";
$scity="Burnsville";
$szip="27690";
$sstate="NC";
print "start of tracking...<br>";
$queryString = qq~
<?xml version="1.0"?>
https://secure.shippingapis.com/ShippingAPI.dll?API=Verify&XML=
<AddressValidateRequest USERID="xxxxxxx">
<Revision>1</Revision>
<Address ID="0">
<Address1></Address1>
<Address2>$saddress</Address2>
<City>$scity</City>
<State>$sstate</State>
<Zip5>$szip</Zip5>
<Zip4></Zip4>
</Address>
</AddressValidateRequest>
<?xml version="1.0"?>
<AddressValidateResponse><Address ID="0">
<Address2></Address2><City></City><State></State><Zip5></Zip5>
<Zip4></Zip4></Address></AddressValidateResponse>
<Error>
<Number></Number>
<Source></Source>
<Description></Description>
<HelpFile></HelpFile>
<HelpContext></HelpContext>
</Error>
~;
# Instantiate the user agent and set our agent string
$userAgent = new LWP::UserAgent;
$userAgent->agent( 'USPS' );
$queryString =~ s/ /\%20/ig;
$request = new HTTP::Request( 'POST',
'https://secure.shippingapis.com/ShippingAPI.dll' );
# Set the content type
$request->content_type( 'text/xml' );
# Set the query string
$request->content( $queryString );
# Make the request
$response = $userAgent->request( $request );
print $response->content();
# Check the status of the request
if ( $response->is_success ) {
$content = $$response{ "_content" };
$TrackingNumber = "";
($success) = $content =~ /\<AddressValidateRequest\>(.*)\
<\/AddressValidateRequest\>/;
print "<br><br>==>some $content<br>";
if ($success eq "Success") {
($TrackingNumber)= $content =~ /\<Zip4\>(.*)\<\/Zip4\>/;
print "here with $TrackingNumber<br>";
}
}
else {
print "<br><br>here with resp=$response<br>req=$request <br>cont
$content";
}
}
To start, you have an invalid query string. Compare what you are doing to the USPS WebTools example and remove what they don't show. Be sure that you are setting your USERID properly (and not using 'xxxxxxx'); an environment variable is handy there.
Or, you can try using the Business::USPS::WebTools module from GitHub. It implements the Zip Code Lookup.
Change your query XML to:
$queryString = qq~
<AddressValidateRequest USERID="xxxxxxxxxxx">
<Revision>1</Revision>
<Address ID="0">
<Address1></Address1>
<Address2>$saddress</Address2>
<City>$scity</City>
<State>$sstate</State>
<Zip5>$szip</Zip5>
<Zip4></Zip4>
</Address>
</AddressValidateRequest>
~;
Then build the request as a GET instead of a POST:
$userAgent = new LWP::UserAgent;
$userAgent->agent( 'USPS' );
$url = "https://secure.shippingapis.com/ShippingAPI.dll?API=Verify&XML=$queryString";
$request = new HTTP::Request( 'GET', $url );
I have a simple code like this:
use LWP::UserAgent;
use HTTP::Cookies;
use HTTP::Request;
my $cookies = HTTP::Cookies->new();
my $browser = LWP::UserAgent->new();
$browser->agent(' ... ');
$browser->cookie_jar($cookies);
my $request = HTTP::Request->new();
my $response;
my $url;
my $referer;
$referer = '';
$url = 'https:// ...'; # url #1
$request->url($url);
$request->method('GET');
$request->header('Referer' => $referer);
$response = $browser->request($request);
print $response->request()->uri() . "\n\n" .
$response->headers()->as_string . "\n\n" .
$response->content . "\n\n";
$referer = $response->request()->uri();
$url = 'https:// ... '; # url #2
$request->url($url);
$request->method('GET');
$request->header('Referer' => $referer);
$response = $browser->request($request);
print $response->request()->uri() . "\n\n" .
$response->headers()->as_string . "\n\n" .
$response->content . "\n\n";
Now, I want to see full HTTP request headers as well, not just response headers.
How can I do it? What has to be added to this code?
I think you almost have it in your existing code. You are accessing the request URI with $response->request()->uri(). The ->request() is your HTTP::Request object. I believe that you can use $response->request->headers->as_string to get what you want.
print $response->request->as_string
This will show you requests as well as responses.
use LWP::UserAgent;
use LWP::ConsoleLogger::Easy qw( debug_ua );
my $browser = LWP::UserAgent->new();
debug_ua( $browser );
$request->headers->as_string and $response->request->headers->as_string will you get you the headers of the first and last request passed to Net::HTTP by LWP[1], but these aren't quite what Net::HTTP sends. For example, Net::HTTP can add a Content-Length header, a TE header, and/or a number of others.
Net::HTTP doesn't keep a record of the headers it actually sends. You will need a wire sniffer (e.g. tcpdump) or a debugging proxy (e.g. Fiddler) for that. You could also use a debugger or trace statements to view the request prepared in Net::HTTP::Methods's format_request. The most convenient, however, might be to wrap Net::HTTP::Methods's format_request.
These are the same unless the initial request was redirected. To get all the requests (and responses), you can use:
while ($response) {
my $request = $response->request;
...
$response = $response->previous;
}
I want to get last of redirect URL.
like
url_1 : http://on.fb.me/4VGeu
url_2 : https://www.facebook.com/
I want to get url_2 by url_1 in perl.
Previous source is below.
sub get_redirect_location
{
my ($url) = #_;
my $ua = LWP::UserAgent->new;
$ua->proxy('http', 'SAMPLE_PROXY');
my $req = new HTTP::Request(GET => $url);
my $res = $ua->request($req);
return $res->headers_as_string;
}
Thanks in advance.
You can find the request that lead to a response using
$response->request()
You can get the previous response in the chain using
$response->previous()
All together:
while ($response) {
say $response->request()->uri();
$response = $response->previous();
}
You could look at WWW::Mechanize. I have used it before to do something like this.
http://search.cpan.org/~jesse/WWW-Mechanize-1.72/lib/WWW/Mechanize.pm#$mech->redirect_ok()
You may also find this post helpful:
Perl WWW::Mechanize (or LWP) get redirect url
The following program fails when trying to go to an https web site that requires basic authentication.
use Mojo::UserAgent;
my $ua = Mojo::UserAgen->new;
my $user = "foobar";
my $pass = "Cant#change";
my $url = "https://$user:$pass\#site.foo.com";
my $tx = $ua->get($url);
if (my $res = $tx->success) {
say $res->body;
}
else {
my ($message, $code) = $tx->error;
say $code ? "$code response $message" : "Connection error: $message";
}
When I run with MOJO_USERAGENT_DEBUG=1 I get the following output:
-- Blocking request (https://foobar:cant#change#site.foo.com)
-- Connect (https:foobar:Cant:443)
Connection error: Couldn't connect
Using Mojolicious 3.35 updated from CPAN. Unfortunately, passwords will likely contain "special characters" (ascii #!#%^& and the like) and changing the password to something not containing a # is not an option. The web server handles the request correctly in web browsers, so I do not believe it is a web server configuration issue.
So is there another way to achieve this in Mojo?
The error is yours, not Mojo's. Specifically, the URL is incorrectly built. Fix:
use URI::Escape qw( uri_escape );
my $creds = uri_escape($user) . ':' . uri_escape($pass);
my $url = 'https://' . $creds . '#site.foo.com/';
use Mojo::Base -strict;
use Mojo::URL;
#1 Mojo way
my $url = Mojo::URL->new('http://google.com/')->userinfo('user:pa#ss');
say $url;
#2 or manually
use Mojo::Util qw/url_escape/;
my $auth = join ':', url_escape('user'), url_escape('pa#ss');
my $url2 = qq{http://$auth\#google.com/};
say $url2;
I have some code I've written in PHP for consuming our simple webservice, which I'd also like to provide in Perl for users who may prefer that language. What's the simplest method of making a HTTP request to do that? In PHP I can do it in one line with file_get_contents().
Here's the entire code I want to port to Perl:
/**
* Makes a remote call to the our API, and returns the response
* #param cmd {string} - command string ID
* #param argsArray {array} - associative array of argument names and argument values
* #return {array} - array of responses
*/
function callAPI( $cmd, $argsArray=array() )
{
$apikey="MY_API_KEY";
$secret="MY_SECRET";
$apiurl="https://foobar.com/api";
// timestamp this API was submitted (for security reasons)
$epoch_time=time();
//--- assemble argument array into string
$query = "cmd=" .$cmd;
foreach ($argsArray as $argName => $argValue) {
$query .= "&" . $argName . "=" . urlencode($argValue);
}
$query .= "&key=". $apikey . "&time=" . $epoch_time;
//--- make md5 hash of the query + secret string
$md5 = md5($query . $secret);
$url = $apiurl . "?" . $query . "&md5=" . $md5;
//--- make simple HTTP GET request, put the server response into $response
$response = file_get_contents($url);
//--- convert "|" (pipe) delimited string to array
$responseArray = explode("|", $response);
return $responseArray;
}
LWP::Simple:
use LWP::Simple;
$contents = get("http://YOUR_URL_HERE");
LWP::Simple has the function you're looking for.
use LWP::Simple;
$content = get($url);
die "Can't GET $url" if (! defined $content);
Take a look at LWP::Simple.
For more involved queries, there's even a book about it.
I would use the LWP::Simple module.
Mojo::UserAgent is a great option too!
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
# Say hello to the Unicode snowman with "Do Not Track" header
say $ua->get('www.☃.net?hello=there' => {DNT => 1})->res->body;
# Form POST with exception handling
my $tx = $ua->post('https://metacpan.org/search' => form => {q => 'mojo'});
if (my $res = $tx->success) { say $res->body }
else {
my ($err, $code) = $tx->error;
say $code ? "$code response: $err" : "Connection error: $err";
}
# Quick JSON API request with Basic authentication
say $ua->get('https://sri:s3cret#example.com/search.json?q=perl')
->res->json('/results/0/title');
# Extract data from HTML and XML resources
say $ua->get('www.perl.org')->res->dom->html->head->title->text;`
Samples direct from CPAN page. I used this when I couldn 't get LWP::Simple to work on my machine.
Try the HTTP::Request module.
Instances of this class are usually passed to the request() method of an LWP::UserAgent object.
If it's in Unix and if LWP::Simple isn't installed, you can try:
my $content = `GET "http://trackMyPhones.com/"`;
I think what Srihari might be referencing is Wget, but I would actually recommend (again, on *nix without LWP::Simple) to use cURL:
$ my $content = `curl -s "http://google.com"`;
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>
The -s flag tells curl to be silent. Otherwise, you get curl's progress bar output on standard error every time.