URI::URL Not Getting Parameter - perl

# The url I'm on: https://development.cherrylanekeepsakes.com/cgi-bin/employees.cgi?action=edit_timeclock_dashboard&id=80&start_date=2012-09-01&end_date=2012-09-16&print_view=1
use URI::URL;
use Data::Dumper;
my $url = URI::URL->new( '' . $cgi->new->url(-path_info => 1, -query => 1) );
warn Dumper($url->params('print_view'));
It gives me nothing. What am I doing wrong? This seems like a pretty simple task.

Is there a reason you're using URI::URL instead of URI? It's an obsolete module that only exists for backwards compatibility. It's not even documented, so I can't even confirm that params is suppose to do what you think it does.
What follows is a solution using the module that replaced URI::URL. It's even part of the same distribution.
use URI qw( );
my $url = URI->new('https://...');
my %query_form = $url->query_form();
say $query_form{print_view};
Or better yet,
use URI qw( );
use URI::QueryParam qw( );
my $url = URI->new('https://...');
say $url->query_param('print_view');
Note: To assign one of the value of query_param to a scalar, you need to use parens as follows:
my ($print_view) = $url->query_param('print_view');

As per #ikegami's recommendation, I am now using URI, instead of obsolete URL::URI.
# https://development.cherrylanekeepsakes.com/cgi-bin/employees.cgi?action=edit_timeclock_dashboard&id=80&start_date=2012-09-01&end_date=2012-09-16&print_view=1
use URI;
use URI::QueryParam;
my $url = URI->new('' . $cgi->new->url(-path_info => 1, -query => 1));
warn $url->query_param('print_view'); # prints 1 as expected

The reason your code displays nothing is that your URL has no param fields - only a set of query fields.
A URL looks roughly like
scheme://host:port/path1/path2;param1;param2?query1=A&query2=B#fragment
and there are no semicolons in your URL

Related

How may I bypass LWP's URL encoding for a GET request?

I'm talking to what seems to be a broken HTTP daemon and I need to make a GET request that includes a pipe | character in the URL.
LWP::UserAgent escapes the pipe character before the request is sent.
For example, a URL passed in as:
https://hostname/url/doSomethingScript?ss=1234&activities=Lec1|01
is passed to the HTTP daemon as
https://hostname/url/doSomethingScript?ss=1234&activities=Lec1%7C01
This is correct, but doesn't work with this broken server.
How can I override or bypass the encoding that LWP and its friends are doing?
Note
I've seen and tried other answers here on StackOverflow addressing similar problems. The difference here seems to be that those answers are dealing with POST requests where the formfield parts of the URL can be passed as an array of key/value pairs or as a 'Content' => $content parameter. Those approaches aren't working for me with an LWP request.
I've also tried constructing an HTTP::Request object and passing that to LWP, and passing the full URL direct to LWP->get(). No dice with either approach.
In response to Borodin's request, this is a sanitised version of the code I'm using
#!/usr/local/bin/perl -w
use HTTP::Cookies;
use LWP;
my $debug = 1;
# make a 'browser' object
my $browser = LWP::UserAgent->new();
# cookie handling...
$browser->cookie_jar(HTTP::Cookies->new(
'file' => '.cookie_jar.txt',
'autosave' => 1,
'ignore_discard' => 1,
));
# proxy, so we can watch...
if ($debug == 1) {
$browser->proxy(['http', 'ftp', 'https'], 'http://localhost:8080/');
}
# user agent string (pretend to be Firefox)
$agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.7.12) Gecko/20050919 Firefox/1.0.7';
# set the user agent
$browser->agent($agent);
# do some things here to log in to the web site, accept session cookies, etc.
# These are basic POSTs of filled forms. Works fine.
# [...]
my $baseURL = 'https://hostname/url/doSomethingScript?ss=1234&activities=VALUEA|VALUEB';
#values = ['Lec1', '01', 'Lec1', '02'];
while (1) {
if (scalar(#values) < 2) { last; }
my $vala = shift(#values);
my $valb = shift(#values);
my $url = $basEURL;
$url =~ s/VALUEA/$vala/g;
$url =~ s/VALUEB/$valb/g;
# simplified. Would usually check request for '200' response, etc...
$content = $browser->get($url)->content();
# do something here with the content
# [...]
# fails because the '|' character in the url is escaped after it's handed
# to LWP
}
# end
As #bchgys mentions in his comment, this is (almost) answered in the linked thread. Here are two solutions:
The first and arguably cleanest one is to locally override the escape map in URI::Escape to not modify the pipe character:
use URI;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new();
my $res;
{
# Violate RFC 2396 by forcing broken query string
# local makes the override take effect only in the current code block
local $URI::Escape::escapes{'|'} = '|';
$res = $ua->get('http://server/script?q=a|b');
}
print $res->request->as_string, "\n";
Alternatively, you can simply undo the escaping by modifying the URI directly in the request after the request has been created:
use HTTP::Request;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => 'http://server/script?q=a|b');
# Violate RFC 2396 by forcing broken query string
${$req->uri} =~ s/%7C/|/;
my $res = $ua->request($req);
print $res->request->as_string, "\n";
The first solution is almost certainly preferable because it at least relies on the %URI::Escape::escapes package variable which is exported and documented, so that's probably as close as you're gonna get to doing this with a supported API.
Note that in either case you are in violation of RFC 2396 but as mentioned you may have no choice when talking to a broken server that you have no control over.

Removing top-directory-only URLs from a list of URLs?

I have a question that I'm having trouble researching, as I don't know how to ask it correctly on a search engine.
I have a list of URLs. I would like to have some automated way (Perl for preference) to go through the list and remove all URLs that are top directory only.
So for example I might have this list:
http://www.example.com/hello.html
http://www.foo.com/this/thingrighthere.html
In this case I would want to remove example.com from my list, as it is either top-directory only or they reference files in a top directory.
I'm trying to figure out how to do that. My first thought was, count forward slashes and if there's more than two, eliminate the URL from the list. But then you have trailing forward slashes, so that wouldn't work.
Any ideas or thoughts would be much appreciated.
Something like this:
use URI::Split qw( uri_split );
my $url = "http://www.foo.com/this/thingrighthere.html";
my ($scheme, $auth, $path, $query, $frag) = uri_split( $url );
if (($path =~ tr/\///) > 1 ) {
print "I care about this $url";
}
http://metacpan.org/pod/URI::Split
You could do this with regexes, but its much less work to let the URI library do it for you. You won't get caught out by funny schemes, escapes, and extra stuff before and after the path (query, anchor, authorization...). There's some trickiness around how paths are represented by path_segments(). See the comments below and the URI docs for details.
I have assumed that http://www.example.com/foo/ is considered a top directory. Adjust as necessary, but its something you have to think about.
#!/usr/bin/env perl
use URI;
use File::Spec;
use strict;
use warnings;
use Test::More 'no_plan';
sub is_top_level_uri {
my $uri = shift;
# turn it into a URI object if it isn't already
$uri = URI->new($uri) unless eval { $uri->isa("URI") };
# normalize it
$uri = $uri->canonical;
# split the path part into pieces
my #path_segments = $uri->path_segments;
# for an absolute path, which most are, the absoluteness will be
# represented by an empty string. Also /foo/ will come out as two elements.
# Strip that all out, it gets in our way for this purpose.
#path_segments = grep { $_ ne '' } #path_segments;
return #path_segments <= 1;
}
my #filtered_uris = (
"http://www.example.com/hello.html",
"http://www.example.com/",
"http://www.example.com",
"https://www.example.com/",
"https://www.example.com/foo/#extra",
"ftp://www.example.com/foo",
"ftp://www.example.com/foo/",
"https://www.example.com/foo/#extra",
"https://www.example.com/foo/?extra",
"http://www.example.com/hello.html#extra",
"http://www.example.com/hello.html?extra",
"file:///foo",
"file:///foo/",
"file:///foo.txt",
);
my #unfiltered_uris = (
"http://www.foo.com/this/thingrighthere.html",
"https://www.example.com/foo/bar",
"ftp://www.example.com/foo/bar/",
"file:///foo/bar",
"file:///foo/bar.txt",
);
for my $uri (#filtered_uris) {
ok is_top_level_uri($uri), $uri;
}
for my $uri (#unfiltered_uris) {
ok !is_top_level_uri($uri), $uri;
}
Use the URI module from CPAN. http://search.cpan.org/dist/URI
This is a solved problem. People have already written, tested and debugged code that handles this already. Whenever you have a programming problem that others have probably had to deal with, then look for existing code that does it for you.

WWW::Mechanize text field issue

I'm trying to submit a form by post method using WWW::Mechanize perl module.
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
...
$mech->get($url);
...
my $response = $mech->submit_form(
form_name => $name,
fields => {
$field_name => $field_value
},
button => 'Button'
);
$field_name is generally speaking a text field (though the type is not specified explicitly in the form), which has a preset value.
$field_name => $field_value in $mech->submit_form on whatever reason does not replace the value, instead $field_value is added into the form after the original value:
{submitted_field_value} = {original_value},{provided_value}
How to replace {original_value} with {provided_value} in the form to be submitted ?
What happens if you add this single line to your code before calling $mech->submit_form():
$mech->field( $name, [$field_value], 1 );
This makes sure that the first value is added, or overwritten if it already exists.
1 is the number parameter (or position index)
See the documentation of WWW::Mechanize:
$mech->field( $name, \#values, $number )
Given the name of a field, set its value to the value specified. [...]
The optional $number parameter is used to distinguish between two
fields with the same name. The fields are numbered from 1.
It's important to remember WWW::Mechanize is better thought of as a 'headless browser' as opposed to say LWP or curl, which only handle all the fiddly bits of http requests for you. Mech keeps its state as you do things.
You'll need to get the form by using $mech->forms or something similar (its best to decide from the documentation. I mean there so many ways to do it.), and then set the input field you want to change, using the field methods.
I guess the basic way to do this comes out as so:
$mech->form_name($name);
$mech->field($field_name, $field_value);
my $response = $mech->click('Button');
Should work. I believe it will also work if you get the field and directly use that (ie my $field = $mech->form_name($name); then use $field methods instead of $mech.
I managed to make it working at my will. Thanks Timbus and knb for your suggestions. Though my case may not be completely general (I know the preset value) but I'd share what I've found (by trails & errors).
my $mech = WWW::Mechanize->new();
$mech->get($url);
$mech->form_name( $name );
my $fields = $mech->form_name($name);
foreach my $k ( #{$fields->{inputs}}){
if ($k->{value} eq $default_value){
$k->{value}=$field_value;
}
}
my $response = $mech->click('Button_name');

Having problem setting parameters for LWP::UserAgent

my %parameters = (
key => 'value'
);
my $response = $ua->get('http://example.com/i', %parameters);
I'm trying to get content of http://example.com/i?key=value,but after debugging I found the %parameters are stored in http headers instead of url parameters.
What's wrong in my code?
Though perldoc tells me that :
$ua->get( $url , $field_name => $value, ... )
But it should also work if I put those parameters in a %parameters,right?
The additional parameters to get are HTTP headers. For GET requests, arguments are included in the URL itself, URLencoded. You can use the URI module to create the appropriate URLs including GET variables, or construct them yourself (probably using URI::Escape to urlencode the values).
e.g.:
my %parameters = (
key => 'value'
);
my $url = URI->new("http://example.com/i");
$url->query_form(%parameters);
my $response = $ua->get($url);
From the fine manual:
$ua->get( $url )
$ua->get( $url , $field_name => $value, ... )
This method will dispatch a GET request on the given $url. Further arguments can be given to initialize the headers of the request.
Emphasis mine. You're misreading the documentation, the extra parameters for get() are HTTP header fields, not CGI parameters. If you want to include some CGI parameters then you'll have to add them to the URI yourself (preferably with URI).

How do I pass a variable into a URL in a Perl script?

How do I pass a variable into a URL in a Perl script?
I am trying to pass the variables in an array to url. For some reason it is not working. I am not sure what I am doing wrong. The code roughly looks like this:
#coins = qw(Quarter Dime Nickel);
foreach (#coins) {
my $req = HTTP::Request->new(POST =>'https://url/$coins.com');
}
This does not work as $coins does not switch to Quarter,Dime,Nickel respectively.
What am I doing wrong?
First, variables do not interpolate in single quoted strings:
my $req = HTTP::Request->new(POST => "https://url/$coins.com");
Second, there is no variable $coins defined anywhere:
foreach my $coin (#coins) {
my $req = HTTP::Request->new(POST => "https://url/$coin.com");
}
Also, make sure to use strict and warnings.
You should also invest some time into learning Perl properly.
Use
'https://url/' . $_ . '.com'
Instead of your
'https://url/$coins.com'