Perl LibXML adding nodes in a loop - perl

I'm trying to add nodes from one document to a new document I create, but it's not working and I don't know why. Here's the code that's going wrong:
my ($body_node) = $newdoc->findnodes('//body');
my #nodes = $source_doc->findnodes('//div[starts-with(#psname, "xyz")]');
foreach my $node(#nodes) {
$body_node = $body_node->appendChild($node);
}
$newdoc->toFile($outfile);
The code looks for some named div tags and appends them to the body tag. The problem is that it's appending them to the last div tag, not to the body tag so I'm ending up with a bunch of nested divs:
</div></div></div></div></div></div></div></div></div></div></div></div>
</div></div></div></div></div></div></div></div></div></div></div></div></body></html>
If someone could tell me what I'm doing wrong I'd be eternally grateful.

That means you probably need to come back to <body> after adding <div>:
my ($body_node) = $newdoc->findnodes('//body');
my #nodes = $source_doc->findnodes('//div[starts-with(#psname, "xyz")]');
foreach my $node(#nodes) {
$body_node = $body_node->appendChild($node);
($body_node) = $newdoc->findnodes('//body');
}
open (OUT, ">$outfile");
print OUT $newdoc->toString();
close OUT;

Related

How can I separate a image, HTTP URL and email from a Perl string of code

This is a chat program. I managed to separate the Images from most of the text, but it leaves it embedded in a string. There is no telling where the URL will be located in the string, and if typing is before it or after it, it appears in the string! I need it to separate and place only the image regex URL into the s/$image//.
I have tried while loops, foreach loops and crashed the whole system with a for loop! I do get the image in place but only if I leave a whole blank line for it. Same thing with the webpage....
if (($searchhttp = m/^http/sig)
&& ($search_image = m/(.jpg|.jpeg|.gif|.png)/ig)) {
#jpgimage = #_;
$jpgimage = $jpgimage[0];
$jpgimage =~ grep(/(^https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6}) ([\/\w \.-]*)*\/?(?:.jpg|.jpeg|.gif|png)$/sig);
$image = substr($jpgimage, 0);
($image) = split(/\s+/, $jpgimage);
chomp($image);
$filter =~ s/$image/<img src ='$image' align ='left'>/;
print $image.'<BR>';
#print $jpgimage.'<BR>';
}
If I leave it on just one line, it works... If I type before it or after it it does not. it includes the whole string in the a href, or the img src.
I need to find a way to take it out of the string
Example...
It takes the whole text from that line and places it in the right brackets, just one long string...
"testing if this works http://172.31.4.253/images/joe.jpg"
"https://www.perltutorial.org lets try this"
I have spent a month on this... and the out come with this code is the best I've gotten!
There could be and most likely be more then one image.
This is the out comes after I paste 5 pictures, one with the word Test in front, and these 4 are placed in the img src...
http://172.31.4.253/images/joe.jpg
https://www.perltutorial.org/wp-content/uploads/2012/11/Perl-Tutorial.jpg
http://172.31.4.253/images/joe.jpg
https://www.perltutorial.org/wp-content/uploads/2012/11/Perl-Tutorial.jpg
URL parsing and handling is not trivial. It's very easy to get it wrong, thus it should be left to a battle tested module if possible. Consider this code.
use URI;
use URL::Search qw(extract_urls);
my $webpage = join "", <DATA>; # wherever your data comes from
for my $url (extract_urls $webpage)
{
my $url_object = URI->new( $url );
my $host_ok = $url_object->host =~ /\.(com|net|jp|org|uk)$/i;
my $is_image = $url_object->path =~ /\.(jpg|jpeg|gif|png)$/i;
my $save_url = $url_object->canonical;
my $regex_for_url = quotemeta( $url );
$webpage =~ s/$regex_for_url/<img src="$save_url">/g
if $host_ok && $is_image;
}
print $webpage;
__DATA__
https://docs.perl6.org
https://github.xxx/foo.gif
https://docs.perl6.org/camelia.png
https://docs.perl6.org/camelia.gif
Output
https://docs.perl6.org
https://github.xxx/foo.gif
<img src="https://docs.perl6.org/camelia.png">
<img src="https://docs.perl6.org/camelia.gif">

WWW::Mechanize::Firefox looping though links

I am using a foreach to loop through links. Do I need a $mech->back(); to continue the loop or is that implicit.
Furthermore do I need a separate $mech2 object for nested for each loops?
The code I currently have gets stuck (it does not complete) and ends on the first page where td#tabcolor3 is not found.
foreach my $sector ($mech->selector('a.link2'))
{
$mech->follow_link($sector);
foreach my $place ($mech->selector('td#tabcolor3'))
{
if (($mech->selector('td#tabcolor3', all=>1)) >= 1)
{
$mech->follow_link($place);
print $_->{innerHTML}, '\n'
for $mech->selector('td.dataCell');
$mech->back();
}
else
{
$mech->back();
}
}
You cannot access information from a page when it is no longer on display. However, the way foreach works is to build the list first before it is iterated through, so the code you have written should be fine.
There is no need for the call to back as the links are absolute. If you had used click then there must be a link in the page to click on, but with follow_link all you are doing is going to a new URL.
There is also no need to check the number of links to follow, as a for loop over an empty list will simply not be executed.
To make things clearer I suggest that you assign the results of selector to an array before the loop.
Like this
my #sectors = $mech->selector('a.link2');
for my $sector (#sectors) {
$mech->follow_link($sector);
my #places = $mech->selector('td#tabcolor3');
for my $place (#places) {
$mech->follow_link($place);
print $_->{innerHTML}, '\n' for $mech->selector('td.dataCell');
}
}
Update
My apologies. It seems that follow_link is finicky and needs to follow a link on the current page.
I suggest that you extract the href attribute from each link and use get instead of follow_link.
my #selectors = map $_->{href}, $mech->selector('a.link2');
for my $selector (#selectors) {
$mech->get($selector);
my #places = map $_->{href}, $mech->selector('td#tabcolor3');
for my $place (#places) {
$mech->get($place);
print $_->{innerHTML}, '\n' for $mech->selector('td.dataCell');
}
}
Please let me know whether this works on the site you are connecting to.
I recommend to use separate $mech object for this:
foreach my $sector ($mech->selector('a.link2'))
{
my $mech = $mech->clone();
$mech->follow_link($sector);
foreach my $place ($mech->selector('td#tabcolor3'))
{
if (($mech->selector('td#tabcolor3', all=>1)) >= 1)
{
my $mech = $mech->clone();
$mech->follow_link($place);
print $_->{innerHTML}, '\n'
for $mech->selector('td.dataCell');
#$mech->back();
}
# else
# {
# $mech->back();
# }
}
I am using WWW:Mechanize::Firefox to loop over a bunch of URLs with loads of Javascript. The page does not render immediately so need test if a particular page element is visible (similar to suggestion in Mechanize::Firefox documentation except 2 xpaths in the test) before deciding next action.
The page eventually renders a xpath to 'no info' or some wanted stuff after about 2-3 seconds. If no info we go to next URL. I think there is some sort of race condition with both xpaths not existing at once causing the MozRepl::RemoteObject: TypeError: can't access dead object error intermittently (at the sleep 1 in the loop oddly enough).
My solution that seems to work/improve reliability is to enclose all the $mech->getand$mech->is_visible in an eval{}; like this:
eval{
$mech->get("$url");
$retries = 15; #test to see if element visible = page complete
while ($retries-- and ! $mech->is_visible( xpath => $xpath_btn ) and ! $mech->is_visible( xpath => $xpath_no_info )){
sleep 1;
};
last if($mech->is_visible( xpath => $xpath_no_info) ); #skip rest if no info page
};
Others might suggest improvements on this.

Magento error when trying to duplication product

I am using magento 1.7. i have got issue i don't know why this is happen. i just open product in backend for edit then click on duplicate then i got following error
Warning: Illegal string offset 'new_file' in D:\wamp\www\easyshop\app\code\core\Mage\Catalog\Model\Product\Attribute\Backend\Media.php on line 158
when i try following code to debug file:
print_r($newImages);
die;
then i got this following data
Array
(
[/s/a/samsung_galaxy_s2_front1.jpg] => /s/a/samsung_galaxy_s2_front1_4.jpg
[/s/g/sgs2p1.jpg] => /s/g/sgs2p1_4.jpg
[/s/g/sgs2_11.jpg] => /s/g/sgs2_11_4.jpg
[/s/g/sgs2-4386.jpg] => /s/g/sgs2-4386_4.jpg
)
I thing array keys are wrong can you please give solution to solve this problem
I had the same problem on 1.7.02. The solution I found was to change Magento's (IMHO) bugged code.
On Mage_Catalog_Model_Product_Attribute_Backend_Media i've changed the lines where you find:
// For duplicating we need copy original images.
$duplicate = array();
foreach ($value['images'] as &$image) {
if (!isset($image['value_id'])) {
continue;
}
$duplicate[$image['value_id']] = $this->_copyImage($image['file']);
$newImages[$image['file']] = $duplicate[$image['value_id']];
}
for:
// For duplicating we need copy original images.
$duplicate = array();
foreach ($value['images'] as &$image) {
if (!isset($image['value_id'])) {
continue;
}
$duplicate[$image['value_id']] = $this->_copyImage($image['file']);
$newImages[$image['file']] = array();
$newImages[$image['file']]['new_file'] = $duplicate[$image['value_id']];
$newImages[$image['file']]['label'] = $image['label'];
}
It did the trick for me... Images are now being properly duplicated and enabled on new product.

program exhibiting bizarre behavior when reading words out from a file

So I have two files, one that contains my text, and another which I want to contain filter words. The one shown here is supposed to be the one with the curse words. Basically, what I'm doing is iterating through each of the words in the text file, and trying to compare them against the curse words.
sub filter {
$word_to_check = $_;
open ( FILE2, $ARGV[1]) || die "Something went wrong. \n";
while(<FILE2>) {
#cursewords = split;
foreach $curse (#cursewords) {
print $curse."\n";
if($word_to_check eq $curse) { return "BAD!";}
}
}
close ( FILE2 );
}
Here are the "curse words":
what is
Here is the text file:
hey dude what is up
But here's what's going wrong. As you can see, I've put a print statement to see if the curse words are getting checked correctly.
hey what
is
dude what
is
what what
is
is what
is
up what
is
I literally have no idea why this could be happening. Please let me know if I should post more code.
EDIT:
AHA! thanks evil otto. It seems I was getting confused with another print statement I had put in before. Now the problem remains: I think I'm not checking for string equality correctly. Here's where filter is getting called:
foreach $w( #text_file_words )
{
if(filter($w) eq "BAD!")
{
#do something here
}
else { print "good!"; }
}
EDIT 2: Nevermind, more stupidity on my part. I need to get some sleep, thanks evil otto.
change
$word_to_check = $_;
to
$word_to_check = shift;
You needed to collect arguments as an array in perl...
sub myFunction{
($wordToCheck) = #_; #this is the arg array, if you have more than one arg you just separate what's between the parenthesis with commas.
}

How to cancel a file upload based on file size in Catalyst

I'm writing a file upload handler Catalyst. I'm trying to restrict the maximum file size. To do this I've made a Plugin (based on the answer here). Here is the code where I check for the file size:
before 'prepare_body' => sub {
my $c = shift;
my $req = $c->request;
my $length = $req->headers->{"content-length"};
if ($length > 10000)
{
$c->stash->{errors} = "File upload error";
# how do I abort the upload?
}
};
This correctly detects files that are too big, but I can't for the life of me figure out how to abort the upload. Ideally, it should also reach the controller/action. Can anyone give me a pointer? Thanks a lot.
Very simply, you probably shouldn't. Anything you do from plugin code to abort the handling is going to knock out the ability of user code to deal with the situation in a nice way (for example by giving a validation error or a nice error page, instead of a Catalyst exception page).
However, all is not lost. Why not try something like this?
around 'prepare_body' => sub {
my ($orig, $self) = (shift, shift);
my ($c) = #_;
my $max_length = $c->config->{'Plugin::WhateverMyNameIs'}->{max_request_size};
$max_length = 1_000_000 unless defined $max_length; # default
my $length = $c->engine->read_length;
if ($length <= $max_length) { # ok, go ahead
$self->$orig(#_);
} else {
$c->stash->{request_body_aborted} = 1;
}
};
This will stop the read if your request is over-size, but it will let dispatch proceed as normal -- which means you will want to write some code in your action, or in a begin action, or in a chain root, that checks for $c->stash->{request_body_aborted} and does something appropriate -- whether that's setting a form validation error, or calling $c->error("Request too large"); $c->detach or whatever. It's also configurable, as any plugin should be.
I think this needs to occur earlier in the chain. If you have the headers, then the packet is already created.
Perhaps you could try: $c->detach(); or possibly loop through the $c->stack array and remove actions that might have been added, related to your upload.