OK So, I'm learning/using xpath for a basic application that's effectively ripping data off another website.
I need to gain the knowledge of each persons Country/Suburb/area.
In some instances you can get Australia/Victoria/Melbourne for instance.
Others may just be Australia/Melbourne.
Or even just Melbourne OR just Australia.
So I'm current able to view the below code and rip all of the information with the string xpath //table/tr/td/table/tr/td/font/a. This returns every entry, but what I really want is to group each lot separately.
I hope someone out there on planet earth knows what I just tried to explain... and can help...
Good day!
The source document contains data like this:
<tr>
<td>
<font face="arial" size="2">
<strong>Location:</strong>
Australia,
<a href='http://maps.google.com/maps?q=Australia%20Victoria'target="mapblast" style='text-decoration:none'>Victoria</a>,
<a href='http://maps.google.com/maps?q=Australia%20Melbourne%20Victoria'target="mapblast" style='text-decoration:none'>Melbourne</a>
</font>
</td>
</tr>
To find each person's record, the XPath query is //table/tr/td/table/tr/td/font, or you could use //td/font[strong = 'Location:']. This will return a collection containing 1 element for each person.
To find the a elements under a particular font you could use XPath a from the font. This can also be done by iterating the children collection of the element.
Related
I am looping the transaction.item to get the stockcodes which perfectly works like a charm.
But when I tried to get the links for individual item, it populates all of the td tag, the link should exist ONLY on stockcode 100132 but instead the rest of the items get the links too. Also I did double check the databse if there were any links for the rest of stockcodes. It only exist on stockcode 100132.
This is definitely weird and doesnt make any sense to me. Here's my code for the list
<#list transaction.item as sdsitem>
<tr style="text-align: center">
<td class="th-border stockcode">${sdsitem.item}</td>
<td class="th-border sdslink">
<#if (sdsitem.item.custitemabco_sds_email_link)??>
<a href="${sdsitem.item.custitemabco_sds_email_link}"
target="_blank">Link only exists on stockcode 100132</a>
</#if>
</td>
</tr>
</#list>
Thank you so much for those who will give a time to help me. I'm a beginner at Netsuite, and will really appreciate the answer. God bless!
I think the problem you're seeing can be answered by Suite Answer 98056. When an item is referenced on a transaction (Purchase/Sales/Work/Transfer Order etc), the fields that are found on the item record can not be directly accessed by using a dot to drill through the item.
Instead, you will need to create a new Transaction Item Field that is sourced from the item record, and the field name you're looking at using i.e. custitemabco_sds_email_link.
I am working on protractor to test the AngularJs application. Here I came across one scenario where I want to click on image for different users. But the id for image is same for all (say 10) users. So I found one more element that is one unique number allocated to each user. The code for 2 different users are:
USER1:
img id="searchPatientImgAdmittedM" class="img-circle picwidth" ng-click="getPatientVitalLabPharmacy(patient.patientId._id)" onclick="ShowHide(this)" src="icons/male.png" alt="" role="button" tabindex="0"
span class="clearfloat ng-binding">12339/span
USER2:
img id="searchPatientImgAdmittedM" class="img-circle picwidth" ng-click="getPatientVitalLabPharmacy(patient.patientId._id)" onclick="ShowHide(this)" src="icons/male.png" alt="" role="button" tabindex="0"
span class="clearfloat ng-binding">8841/span
EDIT:
The full HTML code
<div class="col-md-10 col-sm-9 col-xs-9 skin-font-color paddingTop7">
<span class="skin-font-color">
<span class="name clearfloat ng-binding">KRISHA</span>
<span class="clearfloat ng-binding">12348</span>
<img id="searchPatientImgAdmittedF" class="img-circle picwidth" ng-click="getPatientVitalLabPharmacy(patient.patientId._id)" onclick="ShowHide(this)" src="icons/femaleImages.jpg" alt="" role="button" tabindex="0">
</div>
I tried to do :
element(by.id('searchPatientImgAdmittedF')).all(by.tagName('12348')).click();
// or
element(by.id('searchPatientImgAdmittedF')).element(by.tagName('12348')).click();
How can I make combination of locators to click on this users. Only image part is clickable.
Thanks four your additions.
Now you're trying to click on a sister-element. There are several approaches to do so.
The one I'm usually using is:
element(by.cssContainingText('span.clearfloat','12348')).element(by.xpath('..')).$('#searchPatientImgAdmittedF').click();
//equal to
element(by.cssContainingText('span.clearfloat','12348')).element(by.xpath('..')).element(by.id('searchPatientImgAdmittedF')).click();
This evaluates first the identifiable tag with the unique number, then climbs up to its parent element, then from there gets the img-element with the ID.
The $() selector
The cssContainingText() selector
Another option would be to use isElementPresent(), which evaluates the existence of a child-element. However, the code is (from my point of view) more complex and I don't see, how cssContainingText() could be used there, so I don't try to do it here.
Thanks for your quick help in solving my issue. I want to add here that I found the answer to my problem and now I am able to click on the particular user I want from the list of many users. The code I am using is :
element(by.cssContainingText('span.clearfloat','12339'))
.element(by.xpath('/html/body/div[3]/div[1]/div[17]/div/div/table[4]/tbody/tr[3]/td[1]/div[1]/img'))
.click();
This is finding the child element first and then the parent element.The id was all same for all the users so it was not taking that and so I used only xpath along with unique number.
Thanks again for the help.
So, my issue is that, when I'm extracting data, there are a couple of entries on the page that, because there isn't a link also associated with them, they don't get selected:
To better explain here is the hxs.select statement that gets almost all of the data:
opening = hxs.select('//div[#id="body"]/div/table/tr/td/table/tr[2]/td/table[2]/tr/td[7]/font/a/text()').extract()
This statement gets all but 3 opening movie dates. The three missing dates, as I mentioned, don't have a link associated with them and are actually found at:
hxs.select('//div[#id="body"]/div/table/tr/td/table/tr[2]/td/table[2]/tr/td[7]/font/text()').extract()
*Notice: there is no /a found at the end.
I would just add an additional statement to get these, but I need all of the information in order. I also have statements that get a movie title and grossing amount. I then take these statements and iterate through them to pair them up with where they belong- I can't do this if I add another statement to separately deal with them. Any suggestions?
::::Data:::::
Here is the url of the data I'm trying to get BoxOfficeMojo
A quick note: If you use Firebug to view the xpath, it adds tbody which doens't actually exist (it adds it in).
Here is what a normal opening date looks like:
<td bgcolor="#ffffff" align="right">
<font size="2">
6/11/2010
</font>
</td>
Here is what one of the 'problem' opening dates look like:
<td bgcolor="#f4f4ff" align="right">
<font size="2">11/20/1981</font>
</td>
Just select all text nodes within that <font/> element using the descendant-or-self-axis step //.
//div[#id="body"]/div/table/tr/td/table/tr[2]/td/table[2]/tr/td[7]/font//text()
I'm trying to extract text from an old vBulletin forum using WWW::Mechanize and Mojo::DOM.
vBulletin doesn't use HTML and CSS for semantic markup, and I'm having trouble using Mojo::DOM->children to get at certain elements.
These vBulletin posts are structured differently depending on their content.
Single message:
<div id="postid_12345">The quick brown fox jumps over the lazy dog.<div>
Single message quoting another user:
<div id="postid_12345">
<div>
<table>
<tr>
<td>
<div>Quote originally posted by Bob</div>
<div>Everyone knows the sky is blue.</div>
</td>
</tr>
</table>
</div>
I disagree with you, Bob. It's obviously green.
</div>
Single message with spoilers:
<div id="postid_12345">
<div class="spoiler">Yoda is Luke's father!</div>
</div>
Single message quoting another user, with spoilers:
<div id="postid_12345">
<div>
<table>
<tr>
<td>
<div>Quote originally posted by Fred</div>
<div class="spoiler">Yoda is Luke's father!</div>
</td>
</tr>
</table>
</div>
<div class="spoiler">No waaaaay!</div>
</div>
Assuming the above HTML and an array packed with the necessary post IDs:
for (#post_ids) {
$mech->get($full_url_of_specific_forum_post);
my $dom = Mojo::DOM->new($mech->content);
my $div_id = 'postid_' . $_;
say $dom->at($div_id)->children('div')->first;
say $dom->at($div_id)->text;
}
Using $dom->at($div_id)->all_text gives me everything in an unbroken line, which makes it difficult to tell what's quoted and what's original in the post.
Using $dom->at($div_id)->text skips all of the child elements, so quoted text and spoilers are not picked up.
I've tried variations of $dom->at($div_id)->children('div')->first, but this gives me everything, including the HTML.
Ideally, I'd like to be able to pick up all the text in each post, with each child element on its own line, e.g.
POSTID12345:
+ Quote originally posted by Bob
+ Everyone knows the sky is blue.
I disagree with you, Bob. It's obviously green.
I'm new to Mojo and rusty with Perl. I wanted to solve this on my own, but after looking over the documentation and fiddling with it for a few hours, my brain is mush and I'm at a loss. I'm just not getting how Mojo::DOM and Mojo::Collections work.
Any help will be greatly appreciated.
Looking at the source of Mojo::DOM, basically the all_text method recursively walks the DOM and extracts all text. Use that source to write your own walking the DOM function. Its recursive function depends on returning a single string, in yours you might have it return an array with whatever context you need.
EDIT:
After some discussion on IRC, the web scraping example has been updated, it might help you guide you. http://mojolicio.us/perldoc/Mojolicious/Guides/Cookbook#Web_scraping
There is a module to flattern HTML tree, HTML::Linear.
The explanation of purpose for flatterning HTML tree is a bit long and boring, so here's a picture showing the output of the xpathify tool, bound with that module:
As you see, HTML tree nodes become single key/value list, where the key is the XPath for that node, and the value is the node's text attribute.
In a few keystrokes, this is how you use HTML::Linear:
#!/usr/bin/env perl
use strict;
use utf8;
use warnings;
use Data::Printer;
use HTML::Linear;
my $hl = HTML::Linear->new;
$hl->parse_file(q(vboard.html));
for my $el ($hl->as_list) {
my $hash = $el->as_hash;
next unless keys %{$hash};
p $hash;
}
I am checking on HTML content on my page, and I've got the split down to have the variable left with this content:
">
<td>Oklahoma City</td>
<td>Oklahoma</td>
<td>OK</td>
<td>405</td>
<td>CST</td>
</tr>
</table>
<div id="
Those are dynamic pages I'm checking, so the data will always be different, but the layout the same...
How can I get the value out of the second <td> if that html is in 1 variable(string)?
It was a full page, I've used explode twice to remove everything above a div field and everything below the last dive field id... so it has some open html tags left because I did not know how to get rid of that along the way to be left with just this:
<td>Oklahoma City</td>
<td>Oklahoma</td>
<td>OK</td>
<td>405</td>
<td>CST</td>
</tr>
</table>
Can you tell me how to get that out? I just need the second one because it is the county and that is what I'm checking on...