How can I find a specific piece of a webpage using MOJO::Dom? - perl

Below I put an extract from an IMDb page, I purposely kept it short. My end goal is to get the 2 links. But I can't even figure out how to get a specific div with an id. Because obviously the class below is spread out all over the page. I've Googled, looking for an example using class and id, but still can't find he solution.
p.s. The only reason I have Dumper in there is so when I run it, I can instantly see how I still haven't got it.
my $ua = Mojo::UserAgent->new( max_redirects=>3, timeout => 30 );
my $dom = $ua->get($newrip)->result->dom;
my $module_list = $dom->find('div.article');
print Dumper $module_list;
exit;
<div class="article" id="titleDetails">
<span class=rightcornerlink>
Edit
</span>
<h2>Details</h2>
<div class="txt-block">
<h4 class="inline">Official Sites:</h4>
<a href="/offsite/?page-action=offsite-facebook&token=BCYtlGRCvvzcjTOrSRBQqjTPuEUBGkxbnkfQjRYZi0XJxm-A-A4vf0mzJF5WqH6HYLt2TZCuVR7c%0D%0A209QQMCwUe-51EwtDDYbNczYCnFRIRzctUhoXJCF2gsQJw6m050sV9g0sTJJEfiGP37rfeIIoXMS%0D%0ACfj2qgUNCaL2YaP_FeWVGCg39Bw-3dRsP5cB1Wk9FfobPd5tG8Q4WjVbUR2pTOvE0Pkc5QUK5E7U%0D%0AX7O9awNb0Kw%0D%0A&ref_=tt_pdt_ofs_offsite_0"
rel="nofollow">Official Facebook</a>
<span class="ghost">|</span>
Official site
<span class="see-more inline"></span>
</div>

Related

How to locate multiple text element within class in Protractor?

For on my test i need to verify highlighted text (Lexington, KY) using my protractor test.
<li id="address" class="list">
<div class="content">
<small class="mb-1">
<span>
Suite # 278
<br>
</span>
**Lexington, KY**
</small>
</li>
How to verify highlighted text using css OR cssContainingText locator?
Actually Protractor creators have put great documentation in place , and pls read it thoroughly to gain good knowledge on usage of css & cssContainingText. I will answer your question in short here - Use element(by.cssContainingText('.content','Lexington'))
UPDATE 1:
In case you want to add an assertion .. do this - expect(element(by.cssContainingText('.content','Lexington'))).toContain('Lexington, KY')
For one I am confused because it seems like you are never closing the content div...is it closed after the li is closed?
Anyway...I would simply change the HTML so that you don't need some crazy convoluted mess of a selector. I would do it like this:
<li id="address" class="list">
<div class="content">
<small class="mb-1">
<span>
Suite # 278
<br>
</span>
<cityState>Lexington, KY</cityState>
</small>
</li>
function checkCityState(){
return element(by.tagName('cityState')).getText();
}
expect(checkCityState()).toBe('Lexington, KY');

Dashing - dynamically delete widgets

how can i dynamically delete widgets from a job(rb) in Dashing?
I am building the dashboard dynamically by sending a data to the erb file:
<div class="gridster">
<ul>
<% settings.servers.each do |data| %>
<li data-row="1" data-col="1" data-sizex="1" data-sizey="1">
<div data-id="<%=data['webHost']%>" data-title="<%=data['name']%>" data-version="<%=data['Version']%>" >
</li>
<% end %>
</div>
Yes. I wrote a simple example job that can do just that here:
http://www.mapledyne.com/ideas/2015/6/30/delete-a-dashing-dashboard-widget
You basically just want to manipulate the Sinatra::Application.settings.history variable, but the code in that link should get you most of the way to where you want to be.
Or skip the post and go right to the gist file:
https://gist.github.com/mapledyne/6fb671c17c3f865309f3#file-delete-widget-rb
You can also generate parts of the erb dynamically if you don't know the widgets in the first place (more complicated), but it starts with the same - leveraging that same history variable.

Using xpath to parse out html attributes from webpage

I am having trouble extracting some attributes out of an html page and need some ideas to help me get unstuck.
I am using PowerShell and am using the htmlagilitypack to help me parse the html. I have a very crude version that I was able to do with regex but it doesn't always work so I thought the better option would be to use xpath to parse the results. If regex is the way to go please let me know.
So far I have been able to grab the page that I am interested in and split it apart by rows.
$results = $htmldoc.DocumentNode.SelectNodes("//p[#class='row']")
After the page is split up I am trying to iterate through each row using xpath to grab the information I am interested in.
ForEach ($item in $results) {
$ID=$null
$ID = $item.OuterHtml
}
This gets me close to what I am wanting but it grabs a bunch of other info that I don't want as well. Here is what the $item.outerhml looks like at this point.
OuterHtml : <p class="row" data-latitude="41.5937565437255" data-longitude="-93.6437636649079" data-pid="4184719674">
<span class="star"></span> <span class="pl"> <span class="date">Nov 27</span> iPhone and other Cell Phone Unlocks
</span> <span class="l2"> <span class="pnr"> <small> (Des Moines)</small> <span class="px"> <span class="p"> <a href="#" class="maptag"
data-pid="4184719674">map</a></span></span> </span> <a class="gc" href="/mod/" data-cat="mod">cell phones - by dealer</a> </span> </p>
I just want the data-pid attribute.
I have tried a bunch of other ways to extract the data-pid attribute but haven't had any success. Here is one such method I have tried, but it keeps returning the same value over and over.
$ID = $Date.DocumentNode.SelectSingleNode("//p/#data-pid")
I have a feeling that this is something simple but have hit a roadblock. Let me know what other information I need to post.
In your foreach loop you should be able to get the attribute's value like this:
$ID = $item.GetAttributeValue("data-pid", "")
To walk all the attributes on that node try:
$item.Attributes | Select Name,Value

Can't target iframe with watir-webdriver - Bug?

I am pretty new to automation and watir-webdriver so forgive me if I don't sound super techy.
I am trying to log in to a website and the login form is inside of an iframe. There is also another iframe on the same page that contains an image.
This is the html:
<body>
<div class="topbar">
<div class="topbarcenter">
<ul>
<li id="logo" class="logo">
<div id="provider_logo">
<iframe id="logo_iframe" width="192px" height="128px" frameborder="0" src="http://social.onerecovery.com/modules/iframes/html/provider_logo.html?prov=microsites" onload="this.style.visibility = "visible";" style="visibility: visible;" allowtransparency="true">
</div>
</li>
<li class="login">
<iframe id="login_iframe" width="550px" height="70px" frameborder="0" src="http://social.onerecovery.com/modules/iframes/html/login.html" onload="this.style.visibility = "visible";" style="visibility: visible;" allowtransparency="true">
<html>
<head>
<body>
<div class="login_container">
<div id="login_div">
<form class="login_form" action="#" method="post">
<input type="text" maxlength="100" placeholder="Email Address..." class="email_input processed" name="email">
by the way I am using watir-webdriver 0.3.5 and automating on chrome 17
what I tried was:
$b.frame(:id => "login_iframe").form(:class => "login_form").text_field(:name => "email").set("username")
which I thought would work but in my command line I just get the error: Watir::Exception::UnknownObjectException: unable to locate element, using {:class=>"login_form", :tag_name=>"form"}
I also tried indexing the iframe to make sure I was in the second iframe and not the first but it still didn't work.
When I do
$b.frame(:id => "login_iframe").exists? in command line, I get
true
but when I do
$b.frame(:id => "login_iframe").form(:class => "login_form").exists? in command line, I get
false
The thing is that we have another page that someone can use to login to the same website and the only difference between that page and this page is that this page has a second iframe whereas the other page only has the login iframe and the code
$b.frame(:id => "login_iframe").form(:class => "login_form").text_field(:name => "email").set("username")
works perfectly fine.
Sorry for going on so long. Just wanted to make sure that I gave enough info. Thanks in advance for any help.
The short answer to solving your problem is to use browser.frame(:index => 2) instead of browser.frame(:id => "login_iframe").
Or if you want a slightly more robust solution:
frame = browser.frames.find{ |frame| frame.form(:class => "login_form").exists? }
frame.form(:class => "login_form").text_field(:name => "email").set("username")
That said, I really do not know why that works. It is like it thinks the login control is inside the invite_iframe, which it does not look like in the HTML. I will try to dig deeper, but sounds like a bug to me.
It appears that all of the iframes are shuffled funny. As you can see by the following, the number of text fields in each iframe does not match what is expected.
browser.frames.each{ |x| puts x.id + ' - ' + x.text_fields.length.to_s + ' text_fields' }
#=> logo_iframe - 3 text_fields
#=> login_iframe - 0 text_fields
#=> invite_iframe - 2 text_fields
For the latest versions, it is browser.iframes which lists all the iframes current window has.
browser.iframes.map {|iframe| iframe.src}
This will map src attributes of all iframes.

jQuery inconsistent .remove by class on element with multiple classes

I've got a page where messages and associated elements (responses, forwards, etc) all share a class based on the database id of the parent.
For example
<pre>
<div id="recentMessages">
<div id="a3" class="message a3">this is a message</div>
<div id="a5" class="message a5">this is another message</div>
</div>
<div id="recentComments">
<div id="a3" class="comment a3">this is a comment</div>
<div id="a5" class="comment a5">this is another comment</div>
</div>
<div id="recentActions">
<div id="a3" class="action a3">tim posted a new message</div>
<div id="a4" class="action a4">sara forwarded a message to john</div>
</div>
</pre>
at times I need to remove all elements with the same id, so I originally had
jQuery('div#'+id).remove();
but that would sometimes not remove all the ids because ids are supposed to be unique.
So I added the id as a class. now I use
jQuery('div.'+id).remove();
but this seems to be about 80% effective, and sometimes the divs aren't being removed.
I'm not sure if the issue is because the div has more than one class, but I need the classes because that is how I refer to the elements when somebody clicks.
For instance,
jQuery('div.message').click(function(){
get the id, send it to the server and get the message
});
is there something wrong I'm doing here? or is there a better way to do this?
Looks like this was an issue where a function was being called using a variable which had already been defined. I didn't realize this would cause a problem.
For instance:
jQuery('div','div#recentActions').click(function(){
var removeId=jQuery(this).attr('id').replace('','a');
removeDiv(removeId);
});
function removeDiv(removeId){
jQuery('div#a'+removeId).remove();
}
I can't say for sure this was the issue, but changing the function to:
function removeDiv(cancelId){
jQuery('div#a'+canceld).remove();
}
seems to be working.