This is more a question for XPath syntax than anything else.
I have multiple product pages on a site that have multiple products on each product pages. Each product has a unique ID for the add-to-cart button. I'm trying to return all of the unique ID's so that I can add a couple of the products to the bag. Searching with XPath seems to be the correct solution for this. I have the following code for querying the HTML with XPath and returning the unique ID's:
$XPATH_COUNT = $sel->get_xpath_count("//div[\#class='quick-info-link']/a");
#my_array;
$my_array[0] = $sel->get_attribute("//div[\#class='quick-info-link']/a/\#id");
print $my_array[0];
$count = 0;
while( $count < $XPATH_COUNT )
{
$arrayCount=0;
$a = "//";
foreach( #my_array )
{
$tmp = "a[\#id!='" . $my_array[$arrayCount] . "' and ";
$b .= $tmp;
$d .= "]";
$arrayCount++;
}
$c = "img[\#alt='Quick Shop']";
$e = $c . $d . "/\#id";
$xpath_query = $a . $b . $e;
$my_array[$count] = $sel->get_attribute($xpath_query);
$count++;
}
The output of the first run of this is an XPath query that looks like this:
//a[#id!='quickview-link-PROD7029' and img[#alt='Quick Shop']]/#id
Which correctly returns quickview-link-PROD6945. The second run produces this:
//a[#id!='quickview-link-PROD7029' and a[#id!='quickview-link-PROD6945' and img[#alt='Quick Shop']]]/#id
Which throws an error in my SeleniumRC terminal window of ERROR: Element [..xpath query..] not found on session.
I am aware of the possible use of indexes (i.e. adding an [i] to the end of the XPath query) to access elements on the page, however this isn't something that has worked for me in Selenium.
Any help would be great. Thanks for your time,
Steve
//a[#id!='quickview-link-PROD7029'
and a[#id!='quickview-link-PROD6945' and
img[#alt='Quick Shop']
]
]/#id
Which throws an error in my SeleniumRC
terminal window of ERROR: Element
[..xpath query..] not found on session
It would greatly help if you provide the XML document on which the XPath expression is applied and explain which node(s) you want to select.
Without this necessary information:
The most obvious reason for this problem is that the above expression is looking for a elements that have an a child with some property.
Usually an a element doesn't have any a children.
What you really want is something like:
//a[#id != 'quickview-link-PROD7029'
and
#id != 'quickview-link-PROD6945' and img[#alt='Quick Shop']
]/#id
This can be simplified a bit:
//a[img[#alt='Quick Shop']/#id
[not(. = 'quickview-link-PROD7029'
or
. = 'quickview-link-PROD6945'
)
]
Related
My code is to enter an actor name and the program, via the given actor's filmography in IMDB, lists on a hash table all the cinematic genres of the movies he has acted in as well as their frequency. However, I have a problem: When I type a name like "brad pitt" or "bruce willis" after running the program at the prompt, execution takes indefinitely. How do you know what the problem is?
Another problem: when I type "nicolas bedos" (an actor name that I entered from the beginning), it works but it seems that the index is only made for a single movie selected in the #url_links list. Should the look_down function of the TreeBuilder module within a foreach loop be adapted? I was telling myself that the #genres list was overwritten on each iteration so I added a push () but the result remains the same.
use LWP::Simple;
use PerlIO::locale;
use HTML::TreeBuilder;
use WWW::Mechanize;
binmode STDOUT, ':locale';
use strict;
use warnings;
print "Enter the actor's name:";
my $acteur1 = <STDIN>; # the user enters the name of the actor
print "We will analyze the filmography of the actor $actor1 by genre\n";
#we put the link with the given actor in Mechanize variable in order to browse the internet links
my $lien1 = "https://www.imdb.com/find?s=nm&q=$acteur1";
my $mech = WWW::Mechanize->new();
$mech->get($lien1); #we access the search page with the get function
$mech->follow_link( url_regex => qr/nm0/i ); #we access the first result using the follow_link function and the regular expression nm0 which is in the URL
my #url_links= $mech->find_all_links( url_regex => qr/title\/tt/i ); #owe insert in an array all the links having as regular expression "title" in their URL
my $nb_links = #url_links; #we record the number of links in the list in this variable
my $tree = HTML::TreeBuilder->new(); #we create the TreeBuilder module to access a specific text on the page via the tags
my %index; #we create a hashing table
my #genres = (); #we create the genre list to insert all the genres encountered
foreach (#url_links) { #we make a loop to browse all the saved links
my $mech2 = WWW::Mechanize->new();
my $html = $_->url(); #we take the url of the link
if ($html =~ m=^/title=) { #if the url starts with "/title"
$mech2 ->get("https://www.imdb.com$html"); #we complete the link
my $content = $mech2->content; #we take the content of the page
$tree->parse($content); #we access the url and we use the tree to find the strings that interest us
#genres = $tree->look_down ('class', 'see-more inline canwrap', #We have as criterion to access the class = "see-more .."
sub {
my $link = $_[0]->look_down('_tag','a'); #new conditions: <a> tags
$link->attr('href') =~ m{genres=}; #autres conditions: "genres" must be in the URL
}
);
}
}
my #genres1 = (); #we create a new list to insert the words found (the genres of films)
foreach my $e (#genres){ #we create a loop to browse the list
my $genre = $e->as_text; #the text of the list element is inserted into the variable
#genres1 = split(/[à| ]/,$genre); #we remove the unnecessary characters that are spaces, at and | which allow to keep that the terms of genre cine
}
foreach my $e (#genres1){ #another loop to filter listing errors (Genres: etc ..) and add the correct words to the hash table
if ($e ne ("Genres:" or "") ) {
$index{$e}++;
}
}
$tree->delete; #we delete the tree as we no longer need it
foreach my $cle (sort{$index{$b} <=> $index{$a}} keys %index){
print "$cle : $index{$cle}\n"; #we display the hash table with the genres and the number of times that appear in the filmography of the given actor
}
Thank you in advance for your help,
wobot
The IMDB Conditions of Use say this:
Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.
So you might want to reconsider what you're doing. Perhaps you could look at the OMDB API instead.
I have a Form posting variables containing spaces in their names
e.g.
I perform my ajax request and i can see in chrome inspector that name is correctly passed "with blank space)
In my api.php:
Route::post('/user', 'UserController#get');
UserController
function get(Request $request)
{
dd($request->input('Name Surname')); //display null
dd($request->all()); //I notice the key's changed to Name_Surname
}
Taken that I can't change the names because they have to contain spaces (bad practice? ok but it has to be like that):
how can I avoid spaces to be replaced?
(maybe without to have to manipulate the request->all() returned array keys by hand....)
Short answer I don't believe there to be such a way.
You can map the response with a bit of string replace though:
$data = $request->all()->mapWithKeys(function($item, $key) {
return [str_replace("_", " ", $key) => $item];
});
If it's something you want to apply across the board, you could possible rig up some middleware to apply it to all requests.
If previous answer not work for you, try this:
$data = collect($request->all())->mapWithKeys(function($item, $key) {
return [str_replace("_", " ", $key) => $item];
})->toArray();
You may also normalize the Input Name if it is known...
$field_name = 'FIELD NAME WITH SPACES';
$value = request( str_replace( ' ', '_', $field_name ) );
I'm using Net::Amazon::EC2 to get some information about my instances.
I get all of the tags associated with an instance with:
my $tags = $ec2->describe_tags("Filter.Name" => "resource-id", "Filter.Value" => $instance_id);
According to the docs, this returns an array ref of DescribeTag objects.
I can iterate through the results:
foreach my $tag (#$tags) {
print $tag->key . " = " . $tag->value . "\n";
}
Is there a way I can get a tag with a specific key?
You could probably grep through them. Not very elegant, but I don't know the module you are using.
my #filtered_tags = grep { $_->key eq 'specific' } #$tags;
I am currently attempting to create a Perl webspider using WWW::Mechanize.
What I am trying to do is create a webspider that will crawl the whole site of the URL (entered by the user) and extract all of the links from every page on the site.
But I have a problem with how to spider the whole site to get every link, without duplicates
What I have done so far (the part im having trouble with anyway):
foreach (#nonduplicates) { #array contain urls like www.tree.com/contact-us, www.tree.com/varieties....
$mech->get($_);
my #list = $mech->find_all_links(url_abs_regex => qr/^\Q$urlToSpider\E/); #find all links on this page that starts with http://www.tree.com
#NOW THIS IS WHAT I WANT IT TO DO AFTER THE ABOVE (IN PSEUDOCODE), BUT CANT GET WORKING
#foreach (#list) {
#if $_ is already in #nonduplicates
#then do nothing because that link has already been found
#} else {
#append the link to the end of #nonduplicates so that if it has not been crawled for links already, it will be
How would I be able to do the above?
I am doing this to try and spider the whole site to get a comprehensive list of every URL on the site, without duplicates.
If you think this is not the best/easiest method of achieving the same result I'm open to ideas.
Your help is much appreciated, thanks.
Create a hash to track which links you've seen before and put any unseen ones onto #nonduplicates for processing:
$| = 1;
my $scanned = 0;
my #nonduplicates = ( $urlToSpider ); # Add the first link to the queue.
my %link_tracker = map { $_ => 1 } #nonduplicates; # Keep track of what links we've found already.
while (my $queued_link = pop #nonduplicates) {
$mech->get($queued_link);
my #list = $mech->find_all_links(url_abs_regex => qr/^\Q$urlToSpider\E/);
for my $new_link (#list) {
# Add the link to the queue unless we already encountered it.
# Increment so we don't add it again.
push #nonduplicates, $new_link->url_abs() unless $link_tracker{$new_link->url_abs()}++;
}
printf "\rPages scanned: [%d] Unique Links: [%s] Queued: [%s]", ++$scanned, scalar keys %link_tracker, scalar #nonduplicates;
}
use Data::Dumper;
print Dumper(\%link_tracker);
use List::MoreUtils qw/uniq/;
...
my #list = $mech->find_all_links(...);
my #unique_urls = uniq( map { $_->url } #list );
Now #unique_urls contains the unique urls from #list.
I've been banging my head over this issue for about 5 hours now, I'm really frustrated and need some assistance.
I'm writing a Perl script that pulls jobs out of a MySQL table and then preforms various database admin tasks. The current task is "creating databases". The script successfully creates the database(s), but when I got to generating the config file for PHP developers it blows up.
I believe it is an issue with referencing and dereferencing variables, but I'm not quite sure what exactly is happening. I think after this function call, something happens to
$$result{'databaseName'}. This is how I get result: $result = $select->fetchrow_hashref()
Here is my function call, and the function implementation:
Function call (line 127):
generateConfig($$result{'databaseName'}, $newPassword, "php");
Function implementation:
sub generateConfig {
my($inName) = $_[0];
my($inPass) = $_[1];
my($inExt) = $_[2];
my($goodData) = 1;
my($select) = $dbh->prepare("SELECT id FROM $databasesTableName WHERE name = '$inName'");
my($path) = $documentRoot.$inName."_config.".$inExt;
$select->execute();
if ($select->rows < 1 ) {
$goodData = 0;
}
while ( $result = $select->fetchrow_hashref() )
{
my($insert) = $dbh->do("INSERT INTO $configTableName(databaseId, username, password, path)".
"VALUES('$$result{'id'}', '$inName', '$inPass', '$path')");
}
return 1;
}
Errors:
Use of uninitialized value in concatenation (.) or string at ./dbcreator.pl line 142.
Use of uninitialized value in concatenation (.) or string at ./dbcreator.pl line 154.
Line 142:
$update = $dbh->do("UPDATE ${tablename}
SET ${jobStatus}='${newStatus}'
WHERE id = '$$result{'id'}'");
Line 154:
print "Successfully created $$result{'databaseName'}\n";
The reason I think the problem comes from the function call is because if I comment out the function call, everything works great!
If anyone could help me understand what's going on, that would be great.
Thanks,
p.s. If you notice a security issue with the whole storing passwords as plain text in a database, that's going to be addressed after this is working correctly. =P
Dylan
You do not want to store a reference to the $result returned from fetchrow_hashref, as each subsequent call will overwrite that reference.
That's ok, you're not using the reference when you are calling generate_config, as you are passing data in by value.
Are you using the same $result variable in generate_config and in the calling function? You should be using your own 'my $result' in generate_config.
while ( my $result = $select->fetchrow_hashref() )
# ^^ #add my
That's all that can be said with the current snippets of code you've included.
Some cleanup:
When calling generate_config you are passing by value, not by reference. This is fine.
you are getting an undef warning, this means you are running with 'use strict;'. Good!
create lexical $result within the function, via my.
While $$hashr{key} is valid code, $hashr->{key} is preferred.
you're using dbh->prepare, might as well use placeholders.
sub generateConfig {
my($inName, inPass, $inExt) = #_;
my $goodData = 1;
my $select = $dbh->prepare("SELECT id FROM $databasesTableName WHERE name = ?");
my $insert = $dbh->prepare("
INSERT INTO $configTableName(
databaseID
,username
,password
,path)
VALUES( ?, ?, ?, ?)" );
my $path = $documentRoot . $inName . "_config." . $inExt;
$select->execute( $inName );
if ($select->rows < 1 ) {
$goodData = 0;
}
while ( my $result = $select->fetchrow_hashref() )
{
insert->execute( $result->{id}, $inName, $inPass, $path );
}
return 1;
}
EDIT: after reading your comment
I think that both errors have to do with your using $$result. If $result is the return value of fetchrow_hashref, like in:
$result = $select->fetchrow_hashref()
then the correct way to refer to its values should be:
print "Successfully created " . $result{'databaseName'} . "\n";
and:
$update = $dbh->do("UPDATE ${tablename}
SET ${jobStatus}='${newStatus}'
WHERE id = '$result{'id'}'");
OLD ANSWER:
In function generateConfig, you can pass a reference in using this syntax:
generateConfig(\$result{'databaseName'},$newPassword, "php");
($$ is used to dereference a reference to a string; \ gives you a reference to the object it is applied to).
Then, in the print statement itself, I would try:
print "Successfully created $result->{'databaseName'}->{columnName}\n";
indeed, fetchrow_hashref returns a hash (not a string).
This should fix one problem.
Furthermore, you are using the variable named $dbh but you don't show where it is set. Is it a global variable so that you can use it in generateConfig? Has it been initialized when generateConfig is executed?
This was driving me crazy when I was running hetchrow_hashref from Oracle result set.
Turened out the column names are always returned in upper case.
So once I started referencing the colum in upper case, problem went away:
insert->execute( $result->{ID}, $inName, $inPass, $path );