How would I match the correct hash pair based on a specific string? - perl

I have a simple page hit tracking script that allows for the output to display friendly names instead of urls by using a hash.
UPDATE: I used php to generate the hash below, but used the wrong dynamic page name of item.html. When changed to the correct name, the script returns the desired results. Sorry for wasting anyone's time.
my %LocalAddressTitlePairs = (
'https://www.mywebsite.com/index.html' => 'HOME',
'https://www.mywebsite.com/art_gallery.html' => 'GALLERY',
'https://www.mywebsite.com/cart/item.html?itemID=83&cat=26' => 'Island Life',
'https://www.mywebsite.com/cart/item.html?itemID=11&cat=22' => 'Castaways',
'https://www.mywebsite.com/cart/item.html?itemID=13&cat=29' => 'Pelicans',
and so on..
);
The code for returning the page hits:
sub url_format {
local $_ = $_[0] || '';
if ((m!$PREF{'My_Web_Address'}!i) and (m!^https://(.*)!i) ) {
if ($UseLocalAddressTitlePairs == 1) {
foreach my $Address (keys %LocalAddressTitlePairs) {
return "<a title=\"$Address\" href=\"$_\">$LocalAddressTitlePairs{$Address}</A>" if (m!$_$! eq m!$Address$!);
}
}
my $stub =$1;
return $stub;
}
}
Displaying the log hits will show
HOME with the correct link, GALLERY with the correct url link, but https://www.mywebsite.com/cart/item.html?itemID=83&cat=26
will display a random name instead of what it should be, Island Life for this page.. it has the correct link,-- a different name displays every time the page is loaded.
And, the output for all pages with query strings will display the exact same name. I know the links are correct by clicking thru site pages and checking the log script for my own page visits.
I tried -
while (my($mykey, $Value) = each %LocalAddressTitlePairs) {
return "<a title=\"$mykey\" href=\"$_\">$Value</a>" if(m!$_$! eq m!$mykey$!);
but again, the link is correct but the mykey/Value associated is random too. Way too new to perl to figure this out but I'm doing a lot of online research.

m!$Address$! does not work as expected, because the expression contains special characters such as ?
You need to add escape sequences \Q and \E
m!\Q$Address\E$!
it’s even better to add a check at the beginning of the line, otherwise
my $url = "https://www.mywebsite.com/?foo=bar"
my $bad_url = "https://bad.com?u=https://www.mywebsite.com/?foo=bar"
$bad_url =~ m!\Q$url\E$! ? 1 : 0 # 1, pass
$bad_url =~ m!^\Q$url\E$! ? 1 : 0 # 0, fail

Related

How can I separate a image, HTTP URL and email from a Perl string of code

This is a chat program. I managed to separate the Images from most of the text, but it leaves it embedded in a string. There is no telling where the URL will be located in the string, and if typing is before it or after it, it appears in the string! I need it to separate and place only the image regex URL into the s/$image//.
I have tried while loops, foreach loops and crashed the whole system with a for loop! I do get the image in place but only if I leave a whole blank line for it. Same thing with the webpage....
if (($searchhttp = m/^http/sig)
&& ($search_image = m/(.jpg|.jpeg|.gif|.png)/ig)) {
#jpgimage = #_;
$jpgimage = $jpgimage[0];
$jpgimage =~ grep(/(^https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6}) ([\/\w \.-]*)*\/?(?:.jpg|.jpeg|.gif|png)$/sig);
$image = substr($jpgimage, 0);
($image) = split(/\s+/, $jpgimage);
chomp($image);
$filter =~ s/$image/<img src ='$image' align ='left'>/;
print $image.'<BR>';
#print $jpgimage.'<BR>';
}
If I leave it on just one line, it works... If I type before it or after it it does not. it includes the whole string in the a href, or the img src.
I need to find a way to take it out of the string
Example...
It takes the whole text from that line and places it in the right brackets, just one long string...
"testing if this works http://172.31.4.253/images/joe.jpg"
"https://www.perltutorial.org lets try this"
I have spent a month on this... and the out come with this code is the best I've gotten!
There could be and most likely be more then one image.
This is the out comes after I paste 5 pictures, one with the word Test in front, and these 4 are placed in the img src...
http://172.31.4.253/images/joe.jpg
https://www.perltutorial.org/wp-content/uploads/2012/11/Perl-Tutorial.jpg
http://172.31.4.253/images/joe.jpg
https://www.perltutorial.org/wp-content/uploads/2012/11/Perl-Tutorial.jpg
URL parsing and handling is not trivial. It's very easy to get it wrong, thus it should be left to a battle tested module if possible. Consider this code.
use URI;
use URL::Search qw(extract_urls);
my $webpage = join "", <DATA>; # wherever your data comes from
for my $url (extract_urls $webpage)
{
my $url_object = URI->new( $url );
my $host_ok = $url_object->host =~ /\.(com|net|jp|org|uk)$/i;
my $is_image = $url_object->path =~ /\.(jpg|jpeg|gif|png)$/i;
my $save_url = $url_object->canonical;
my $regex_for_url = quotemeta( $url );
$webpage =~ s/$regex_for_url/<img src="$save_url">/g
if $host_ok && $is_image;
}
print $webpage;
__DATA__
https://docs.perl6.org
https://github.xxx/foo.gif
https://docs.perl6.org/camelia.png
https://docs.perl6.org/camelia.gif
Output
https://docs.perl6.org
https://github.xxx/foo.gif
<img src="https://docs.perl6.org/camelia.png">
<img src="https://docs.perl6.org/camelia.gif">

How do I add this value to an Array and it stays in the script

I want to add a command that adds numbers to the array.
This is what i have exactly:
my $ownerids = ('374867065');
Then later in the script i have this:
if($ownerids == $spl2[0]){
if (index($message, "!adduser") != -1) {
$msg = $spl[1];
$send = "<m t=\"User Added $msg\" u=\"$botid\" />\0";
$socket->send($send);
push (my $ownerids, "$msg");
}
}
I am on a chatbox and this is a chatbot, i want to make it when i say !adduser (thereid) it adds them to a list and they can use the bot commands, and also i want a Delete User, If you can help this will be MUCH appretiated.
If you want ownerids to be an array, then you must prefix it with a #
my #ownerids = ('374867065');
Then to add an element, you can push
push #ownerids, "$msg";
However, you're going to need to fix your other references to #ownerids so it's treated like an array. For example, your first if looks like it's intending to see if $spl2[0] is an owner. If that's the case, then you'll need to grep the array:
if(grep {$_ == $spl2[0]} #ownerids) {

How to keep the first parameter value only?

In my Perl Catalyst application, I get the value of a URL parameter like this, typically:
my $val = $c->request->params->{arg} || '';
But the URL could contain multiple arg=$Val. I only want to keep the first value of arg=. I could add this throughout my code:
my $val = $c->request->params->{arg} || '';
$val = $val->[0] if (ref($val) eq 'ARRAY');
That is rather ugly. Is there a way to pick up the first value or a url parameter in a better way?
Does your app actually expect multiple values for parameter arg? If not, all you need is
my $val = $c->request->params->{arg} || '';
Sure, it will be garbage if the user provides you with a garbage url, but there's nothing you can do to prevent the user from giving you garbage.
If it's actually valid to have more than one value for parameter arg, why would you want just the first value? You'd actually want all the values.
sub param_vals {
my ($params, $name) = #_;
return () if !exists($params->{name});
return $params->{$name} if !ref($params->{name});
return #{ $params->{$name} };
}
my #args = param_vals($c->request->{params}, 'arg');
I just read the code to Catalyst::Request but I don't see anything to always pull out a single value. Too bad Cat doesn't use something like Hash::MultiValue!

WWW::Mechanize::Firefox looping though links

I am using a foreach to loop through links. Do I need a $mech->back(); to continue the loop or is that implicit.
Furthermore do I need a separate $mech2 object for nested for each loops?
The code I currently have gets stuck (it does not complete) and ends on the first page where td#tabcolor3 is not found.
foreach my $sector ($mech->selector('a.link2'))
{
$mech->follow_link($sector);
foreach my $place ($mech->selector('td#tabcolor3'))
{
if (($mech->selector('td#tabcolor3', all=>1)) >= 1)
{
$mech->follow_link($place);
print $_->{innerHTML}, '\n'
for $mech->selector('td.dataCell');
$mech->back();
}
else
{
$mech->back();
}
}
You cannot access information from a page when it is no longer on display. However, the way foreach works is to build the list first before it is iterated through, so the code you have written should be fine.
There is no need for the call to back as the links are absolute. If you had used click then there must be a link in the page to click on, but with follow_link all you are doing is going to a new URL.
There is also no need to check the number of links to follow, as a for loop over an empty list will simply not be executed.
To make things clearer I suggest that you assign the results of selector to an array before the loop.
Like this
my #sectors = $mech->selector('a.link2');
for my $sector (#sectors) {
$mech->follow_link($sector);
my #places = $mech->selector('td#tabcolor3');
for my $place (#places) {
$mech->follow_link($place);
print $_->{innerHTML}, '\n' for $mech->selector('td.dataCell');
}
}
Update
My apologies. It seems that follow_link is finicky and needs to follow a link on the current page.
I suggest that you extract the href attribute from each link and use get instead of follow_link.
my #selectors = map $_->{href}, $mech->selector('a.link2');
for my $selector (#selectors) {
$mech->get($selector);
my #places = map $_->{href}, $mech->selector('td#tabcolor3');
for my $place (#places) {
$mech->get($place);
print $_->{innerHTML}, '\n' for $mech->selector('td.dataCell');
}
}
Please let me know whether this works on the site you are connecting to.
I recommend to use separate $mech object for this:
foreach my $sector ($mech->selector('a.link2'))
{
my $mech = $mech->clone();
$mech->follow_link($sector);
foreach my $place ($mech->selector('td#tabcolor3'))
{
if (($mech->selector('td#tabcolor3', all=>1)) >= 1)
{
my $mech = $mech->clone();
$mech->follow_link($place);
print $_->{innerHTML}, '\n'
for $mech->selector('td.dataCell');
#$mech->back();
}
# else
# {
# $mech->back();
# }
}
I am using WWW:Mechanize::Firefox to loop over a bunch of URLs with loads of Javascript. The page does not render immediately so need test if a particular page element is visible (similar to suggestion in Mechanize::Firefox documentation except 2 xpaths in the test) before deciding next action.
The page eventually renders a xpath to 'no info' or some wanted stuff after about 2-3 seconds. If no info we go to next URL. I think there is some sort of race condition with both xpaths not existing at once causing the MozRepl::RemoteObject: TypeError: can't access dead object error intermittently (at the sleep 1 in the loop oddly enough).
My solution that seems to work/improve reliability is to enclose all the $mech->getand$mech->is_visible in an eval{}; like this:
eval{
$mech->get("$url");
$retries = 15; #test to see if element visible = page complete
while ($retries-- and ! $mech->is_visible( xpath => $xpath_btn ) and ! $mech->is_visible( xpath => $xpath_no_info )){
sleep 1;
};
last if($mech->is_visible( xpath => $xpath_no_info) ); #skip rest if no info page
};
Others might suggest improvements on this.

How should I redirect users in a formmail script?

So I'm using a basic formmail script. Within the script I'm using a redirect variable. The value of the redirect is something like:
http://www.mysite.com/NewOLS_GCUK_EN/bling.aspx?BC=GCUK&IBC=CSEE&SIBC=CSEE
When the redirect action happens however, the URL appears in the browser as:
http://www.mysite.com/NewOLS_GCUK_EN/bling.aspx?BC=GCUK&IBC=CSEE&SIBC=CSEE
You can see the & characters are replaced with &
Is there any way to fix this?
Maybe you can edit the script with a string substitution:
$myRedirectURL =~ s/\&/\&/g;
Or perhaps look in the script where the opposite substitution is taking place, and comment out that step.
HTML::Entities's decode_entities could decode this for you:
$redirect_target = decode_entities($redirect_target);
But passing the destination URL as HTTP argument (e.g. hidden form field) is dangerous (as #Sinan Ünür already said in the comments). Better store the target URL within your script and pass a selector from the form:
if ($selector eq 'home') { $target_url = 'http://www.foo.bar/'; }
elsif ($selector eq 'bling') { $target_url = 'http://www.foo.bar/NewOLS_GCUK_EN/bling.aspx'; }
else {
$target_url = 'http://www.foo.bar/default.html'; # Fallback/default value
}
Using a Hash would be shorter:
my %targets = {
home => 'http://www.foo.bar/',
bling => '/NewOLS_GCUK_EN/bling.aspx',
};
$target_url = $targets{$selector} || '/default_feedback_thanks.html';