How do I create an absolute URL from two components, in Perl? - perl

Suppose I have:
my $a = "http://site.com";
my $part = "index.html";
my $full = join($a,$part);
print $full;
>> http://site.com/index.html
What do I have to use as join, in order to get my snippet to work?
EDIT: I'm looking for something more general. What if a ends with a slash, and part starts with one? I'm sure in some module, someone has this covered.

I believe what you're looking for is URI::Split, e.g.:
use URI::Split qw(uri_join);
$uri = uri_join('http', 'site.com', 'index.html')

use URI;
URI->new("index.html")->abs("http://site.com")
will produce
"http://site.com/index.html"
URI->abs will take care of merging the paths properly following your uri specification,
so
URI->new("/bah")->abs("http://site.com/bar")
will produce
"http://site.com/bah"
and
URI->new("index.html")->abs("http://site.com/barf")
will produce
"http://site.com/barf/index.html"
and
URI->new("../uplevel/foo")->abs("http://site.com/foo/bar/barf")
will produce
"http://site.com/foo/uplevel/foo"
alternatively, there's a shortcut sub in URI namespace that I just noticed:
URI->new_abs($url, $base_url)
so
URI->new_abs("index.html", "http://site.com")
will produce
"http://site.com/index.html"
and so on.

No need for ‘join‘, just use string interpolation.
my $a = "http://site.com";
my $part = "index.html";
my $full = "$a/$part";
print $full;
>> http://site.com/index.html
Update:
Not everything requires a module. CPAN is wonderful, but restraint is needed.
The simple approach above works very well if you have clean inputs. If you need to handle unevenly formatted strings, you will need to normalize them somehow. Using a library in the URI namespace that meets your needs is probably the best course of action if you need to handle user input. If the variance is minor File::Spec or a little manual clean-up may be good enough for your needs.
my $a = 'http://site.com';
my #paths = qw( /foo/bar foo //foo/bar );
# bad paths don't work:
print join "\n", "Bad URIs:", map "$a/$_", #paths;
my #cleaned = map s:^/+::, #paths;
print join "\n", "Cleaned URIs:", map "$a/$_", #paths;
When you have to handle bad stuff like $path = /./foo/.././foo/../foo/bar; is when you want definitely want to use a library. Of course, this could be sorted out using File::Spec's cannonical path function.
If you are worried about bad/bizarre stuff in the URI rather than just path issues (usernames, passwords, bizarre protocol specifiers) or URL encoding of strings, then using a URI library is really important, and is indisputably not overkill.

You might want to take a look at this, an implementation of a function similar to Python's urljoin, but in Perl:
http://sveinbjorn.org/urljoin_function_implemented_using_Perl

As I am used to Java java.net.URL methods, I was looking for a similar way to concatenate URI without any assumption about scheme, host or port (in my case, it is for possibly complex Subversion URL):
http://site.com/page/index.html
+ images/background.jpg
=> http://site.com/page/images/background.jpg
Here is the way to do it in Perl:
use URI;
my $base = URI->new("http://site.com/page/index.html");
my $result = URI->new_abs("images/background.jpg", $base);

Related

How to fetch a one table from HTML source file using lwp module?

I'm beginner. I want to know how to fetch one table form the source HTML file using LWP module? Is it possible to use Regex with LWP?
You can use LWP to get the HTML source of a web page. Most easily, by using the get() function from LWP::Simple.
my $html = get('http://example.com/');
Now, in $html you have a text string (potentially a very long text string) which contains HTML. You can use any techniques you want to extract data from that string.
(Hint: Using a regex to do this is likely to be a very bad idea. It will be far harder than you expect and probably very fragile. Perhaps use a better tool - like HTML::TableExtract instead.)
use Web::Query::LibXML 'wq';
wq('https://www.december.com/html/demo/table.html')
->find('table th')
->each(sub {
my (undef, $e) = #_;
print $e->text . "\n";
});
__END__
Outer Table
Inner Table
CORNER
Head1
Head2
Head3
Head4
Head5
Head6
Little

How to convert HTML file into a hash in Perl?

Is there any simple way to convert a HTML file into a Perl hash? For example a working Perl modules or something?
I was search on cpan.org but did'nt find anything what can do what I want. I wanna do something like this:
use Example::Module;
my $hashref = Example::Module->new('/path/to/mydoc.html');
After this I want to refer to second div element something like this:
my $second_div = $hashref->{'body'}->{'div'}[1];
# or like this:
my $second_div = $hashref->{'body'}->{'div'}->findByClass('.myclassname');
# or like this:
my $second_div = $hashref->{'body'}->{'div'}->findById('#myid');
Is there any working solution for this?
HTML::TreeBuilder::XPath gives you a lot more power than a simple hash would.
From the synopsis:
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse_file( "mypage.html");
my $nb=$tree->findvalue('/html/body//p[#class="section_title"]/span[#class="nb"]');
my $id=$tree->findvalue('/html/body//p[#class="section_title"]/#id');
my $p= $html->findnodes('//p[#id="toto"]')->[0];
my $link_texts= $p->findvalue( './a'); # the texts of all a elements in $p
$tree->delete; # to avoid memory leaks, if you parse many HTML documents
More on XPath.
Mojo::DOM (docs found here) builds a simple DOM, that can be accessed in a CSS-selector style:
# Find
say $dom->at('#b')->text;
say $dom->find('p')->pluck('text');
say $dom->find('[id]')->pluck(attr => 'id');
In case you're using xhtml you could also use XML::Simple, which produces a data structure similar to the one you describe.

Perl Dancer trailing slash

Using the Perl web application framework Dancer, I am having some problems with trailing slashes in the URL matching.
Say for example, I want to match the following URL, with an optional Id parameter:
get '/users/:id?' => sub
{
#Do something
}
Both /users/morgan and /users/ match. Though /users will not. Which does not seem very uniform. Since I would prefer, only matching the URL:s without the trailing slash:
/users/morgan and /users. How would I achieve that?
Another approach is to use a named sub - all the examples of Dancer code tend to use anonymous subs, but there's nothing that says it has to be anonymous.
get '/users' => \&show_users;
get '/users/:id' => \&show_users;
sub show_users
{
#Do something
}
Note that, due to the way Dancer does the route matching, this is order-dependent and, in my experience, I've had to list the routes with fewer elements first.
id will contains everything from /user/ on until an optional slash.
get qr{^/users/?(?<id>[^/]+)?$} => sub {
my $captures = captures;
if ( defined $captures->{id} ) {
return sprintf 'the id is: %s', $captures->{id};
}
else {
return 'global user page'
}
};
I know this is an old question, but I've recently solved this problem by using a Plack middleware. There are two of them you can choose from depending on whether you prefer URLs with trailing slashes or not:
Plack::Middleware::TrailingSlash
Plack::Middleware::TrailingSlashKiller
Using any of the middleware above should greatly simplify your core Dancer application code and unit tests since you do not need to handle both cases.
In addition, as mentioned by Dave Sherohman, you should definitely arrange your routes with the fewer elements first in order to match those first, especially if you use the TrailingSlash middleware to force trailing slashes.

Using hash as a reference is deprecated

I searched SO before asking this question, I am completely new to this and have no idea how to handle these errors. By this I mean Perl language.
When I put this
%name->{#id[$#id]} = $temp;
I get the error Using a hash as a reference is deprecated
I tried
$name{#id[$#id]} = $temp
but couldn't get any results back.
Any suggestions?
The correct way to access an element of hash %name is $name{'key'}. The syntax %name->{'key'} was valid in Perl v5.6 but has since been deprecated.
Similarly, to access the last element of array #id you should write $id[$#id] or, more simply, $id[-1].
Your second variation should work fine, and your inability to retrieve the value has an unrelated reason.
Write
$name{$id[-1]} = 'test';
and
print $name{$id[-1]};
will display test correctly
%name->{...}
has always been buggy. It doesn't do what it should do. As such, it now warns when you try to use it. The proper way to index a hash is
$name{...}
as you already believe.
Now, you say
$name{#id[$#id]}
doesn't work, but if so, it's because of an error somewhere else in the code. That code most definitely works
>perl -wE"#id = qw( a b c ); %name = ( a=>3, b=>4, c=>5 ); say $name{#id[$#id]};"
Scalar value #id[$#id] better written as $id[$#id] at -e line 1.
5
As the warning says, though, the proper way to index an array isn't
#id[...]
It's actually
$id[...]
Finally, the easiest way to get the last element of an array is to use index -1. The means your code should be
$name{ $id[-1] }
The popular answer is to just not dereference, but that's not correct. In other words %$hash_ref->{$key} and %$hash_ref{$key} are not interchangeable. The former is required to access a hash reference nested as an element in another hash reference.
For many moons it has been common place to nest hash references. In fact there are several modules that parse data and store it in this kind of data structure. Instantly depreciating the behavior without module updates was not a good thing. At times my data is trapped in a nested hash and the only way to get it is to do something like.
$new_hash_ref = $target_hash_ref->{$key1}
$new_hash_ref2 = $target_hash_ref->{$key2}
$new_hash_ref3 = $target_hash_ref->{$key3}
because I can't
foreach my $i(keys(%$target_hash_ref)) {
foreach(%$target_hash_ref->{$i} {
#do stuff with $_
}
}
anymore.
Yes the above is a little strange, but creating new variables just to avoid accessing a data structure in a certain way is worse. Am I missing something?
If you want one item from an array or hash use $. For a list of items use # and % respectively. Your use of # as a reference returned a list instead of an item which perl may have interpreted as a hash.
This code demonstrates your reference of a hash of arrays.
#!/usr/bin perl -w
my %these = ( 'first'=>101,
'second'=>102,
);
my #those = qw( first second );
print $these{$those[$#those]};
prints '102'

Creating a sort of "composable" parser for log files

I've started a little pet project to parse log files for Team Fortress 2. The log files have an event on each line, such as the following:
L 10/23/2009 - 21:03:43: "Mmm... Cycles!<67><STEAM_0:1:4779289><Red>" killed "monkey<77><STEAM_0:0:20001959><Blue>" with "sniperrifle" (customkill "headshot") (attacker_position "1848 813 94") (victim_position "1483 358 221")
Notice there are some common parts of the syntax for log files. Names, for example consist of four parts: the name, an ID, a Steam ID, and the team of the player at the time. Rather than rewriting this type of regular expression, I was hoping to abstract this out slightly.
For example:
my $name = qr/(.*)<(\d+)><(.*)><(Red|Blue)>/
my $kill = qr/"$name" killed "$name"/;
This works nicely, but the regular expression now returns results that depend on the format of $name (breaking the abstraction I'm trying to achieve). The example above would match as:
my ($name_1, $id_1, $steam_1, $team_1, $name_2, $id_2, $steam_2, $team_2)
But I'm really looking for something like:
my ($player1, $player2)
Where $player1 and $player2 would be tuples of the previous data. I figure the "killed" event doesn't need to know exactly about the player, as long as it has information to create the player, which is what these tuples provide.
Sorry if this is a bit of a ramble, but hopefully you can provide some advice!
I think I understand what you are asking. What you need to do is reverse your logic. First you need to regex to split the string into two parts, then you extract your tuples. Then your regex doesn't need to know about the name, and you just have two generic player parsing regexs. Here is an short example:
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $log = 'L 10/23/2009 - 21:03:43: "Mmm... Cycles!<67><STEAM_0:1:4779289><Red>" killed "monkey<77><STEAM_0:0:20001959><
Blue>" with "sniperrifle" (customkill "headshot") (attacker_position "1848 813 94") (victim_position "1483 358 221")';
my ($player1_string, $player2_string) = $log =~ m/(".*") killed (".*?")/;
my #player1 = $player1_string =~ m/(.*)<(\d+)><(.*)><(Red|Blue)>/;
my #player2 = $player2_string =~ m/(.*)<(\d+)><(.*)><(Red|Blue)>/;
print STDERR Dumper(\#player1, \#player2);
Hope this what you were looking for.
Another way to do it, but the same strategy as dwp's answer:
my #players =
map { [ /(.*)<(\d+)><(.*)><(Red|Blue)>/ ] }
$log_text =~ /"([^\"]+)" killed "([^\"]+)"/
;
Your log data contains several items of balanced text (quoted and parenthesized), so you might consider Text::Balanced for parts of this job, or perhaps a parsing approach rather than a direct attack with regex. The latter might be fragile if the player names can contain arbitrary input, for example.
Consider writing a Regexp::Log subclass.