How do I find second div class with the same name? - find

I don't even know how to properly ask this.
I just started with python, and I'm trying to make a crawler.
Everything works fine but I can't "call" or "find" the second div with identical class names in the body.
I've been searching internet for help but the way people write their code is not similar to what I wrote.
so the HTML looks something like this:
<div class="card">
<div class="card-body">...</div>
<div class="card-body">...</div>
My code:
comp_link = comp_card.find('a', class_ = 'link')
href_link = comp_link['href']
link_final = 'https://www.someweb.com' + href_link
prof_text = requests.get(link_final).text
prof_soup = BeautifulSoup(prof_text, 'lxml')
comp_name = prof_soup.find('h2', class_ = 'company-name').text.strip()
comp_info = prof_soup.find('div', class_ ='col-md-12 col-lg-4')
but when I try to use
comp_info = comp_info.find('div', class_ = 'card-body'[1])
it doesn't work.
I've tried to experiment, use other peoples solutions from StackOverflow (but I'm too dumb).

Often, I prefer using CSS selectors. In this simple case you could select the second child that has the class name card-body. You can use the nth-child selector to grab the second div:
import bs4
html = """
<div class="card">
<div class="card-body">Not this</div>
<div class="card-body">But this</div>
</div>
"""
soup = bs4.BeautifulSoup(html)
print(soup.select('div.card-body:nth-child(2)'))
Output
[<div class="card-body">But this</div>]
If you happen to be in a situation where the targetted element is not actually the second element, but simply the second element with the class card-body, it may be advantagous to use nth-child(n of selector). This will select the second one element that matches the specified selector:
html = """
<div class="card">
<div class="other-class">Not this</div>
<div class="card-body">Or this</div>
<div class="card-body">But this</div>
</div>
"""
soup = bs4.BeautifulSoup(html)
print(soup.select('div:nth-child(2 of .card-body)'))
Output
[<div class="card-body">But this</div>]
BeautifulSoup's CSS selector logic is driven by the SoupSieve library, and more information can be found here: https://facelessuser.github.io/soupsieve/selectors/pseudo-classes/#:nth-child.

Related

jQuery.on() for children with no particular selector

I have HTML structure like this:
<div class="parent">
<div class="child">
<div class="something">...</div>
</div>
<div class="child">
<div class="something-else">...</div>
</div>
<div class="child">
...
</div>
...
</div>
I catch events (like click) on .child elements like this:
$('.parent').on('click', '.child', function() { ... });
However, I would like to get rid of explicit class specification and base on the fact of direct ancestry itself.
I want to write the code which would not require any particular classes for children elements. Closest thing to this is:
$('.parent').on('click', '*', function() { ... });
But obviously such handler will spread on deeper descendants (.something, .something-else etc.), not only on the first level.
Is there a way to acheive what I look for, being it using something instead of * or some other way?
P.S. I don't want to use direct binding - $('.parent').children().click(function() {...}); - as it is slower and will not work in case of children being dynamically added.
The selector sought for is > *:
$('.parent').on('click', '> *', function() { ... });
(The actual solution was suggested by Josh Crozier in the comments, I just reposted it as an answer.)

Using Protractor to select elements with by.repeater()

Using the following Protractor element and by.repeater() API methods below:
var targetRowText = 'Sales';
var targetGridName = 'myGrid';
var sel = 'grid-directive[grid-name="' + targetGridName + '"] .col-freeze .grid-wrapper';
var gridRows = element(by.css(sel).all(by.repeater('row in vm.sourceData.data'));
var result = gridRows.all(by.cssContainingText('span', targetRowText)).first();
I am able to select the following row element from a grid which I have labeled, myGrid:
<div id="rowId_21" ng-class-odd="'row-2'" ng-class-even="'row-3'" ng-class="vm.hideRow(row)" class="row-3 height-auto">
<div ng-repeat="column in vm.sourceData.columns" >
<div ng-if="!column.subCols" class="ng-scope">
<div ng-if="row[column.field].length !== 0" class="ng-scope highlight21">
<span ng-bind-html="row[column.field] | changeNegToPrenFormat" vm.highlightedrow="" class="ng-binding">
Sales
</span>
</div>
</div>
</div>
</div>
Please note that I have used by.cssContainingText() to look up the "Sales" span element.
MY PROBLEM:
That that I have located this row in var result, how can I retrieve the id attribute of that outer-most div ?
In other words, I need to selected <div id="rowId_21" so that I can reuse id="rowId_21" in a subsequent Protractor selector.
In jQuery, for example, I could use brute force to get that outer div id as follows :
var el = $('grid-directive[grid-name="Sales"] .col-freeze .grid-wrapper #rowId_21 span')
el.parentElement.parentElement.parentElement.parentElement;
Here's a high-level outlines of what I mean. The grid actually separates the left-most column from the actual data rows, so there are two distinct divs that accomplish this:
<div grid-directive grid-name="myGrid">
<div class="col-freeze" >
<!-- CONTAINS LEFT-MOST "CATEGORIES" COLUMN -->
</div>
<div class="min-width-grid-wrapper">
<!-- CONTAINS THE DATA ROWS-->
</div>
However, I'm struggling to do this in Protractor.
Advice is appreciated...
Bob
A straight-forward option would be to get to the desired parent element using the ancestor axis:
element(by.xpath("./ancestor::div[starts-with(#id, 'rowId')]")).getAttribute("id").then(function (parentId) {
// use parentId here
});
Though, I think that this going down and then up the tree should be considered as a sign that you are not approaching the problem in an easy and correct way.

Selecting a DOM Element when (auto-generated) HTML is not well formed

I'm trying to select a control in order to manipulate it but I'm having a problem: I can't select it. Maybe it's because the xml structure, but I really can't change it because it is externally created. SO I have this:
<span class="xforms-value xforms-control xforms-input xforms-appearance xforms-optional xforms-enabled xforms-readonly xforms-valid " id="pName">
<span class="focus"> </span>
<label class="xforms-label" id="xsltforms-mainform-label-2_2_4_3_">Name:</label>
<span class="value">
<input readonly="" class="xforms-value" type="text">
</span>
<span class="xforms-required-icon">*</span>
<span class="xforms-alert">
<span class="xforms-alert-icon"> </span>
</span>
</span>
And what I need is to get the input (line 5). I tryed a lot, for example:
var elem01 = document.getElementById("pName");
console.log("getElementById: " + elem01);
var elem02 = document.evaluate(".//*[#id='pName']" ,document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null );
console.log("evaluate: " + elem02);
console.log(elem02.singleNodeValue);
var elem03 = document.querySelector("#pName");
console.log("querySelector: " + elem03);
But none of that allows me to get a reference to the control. What's wrong?
With XPath, the problem seems to be the XML is no well formed, so document.getElementById("pName") doesnt return anything.
http://jsfiddle.net/wmzyqqja/7/
The problem with your example is that you are executing your Javascript before the relevant DOM elements are loaded (i.e. your code is in the head element):
This will fix the example:
window.onload = changeControlValue;
JSFiddle: http://jsfiddle.net/TrueBlueAussie/wmzyqqja/8/
Try this
var elem01 = document.getElementById("pName");
var inp = elem01.getElementsByTagName("input")[0];
(in JSFiddle the "onload" setting is required.)

Unable to Extract Link with Mojolicious

I am trying to extract the link for the next page in a search results page using Mojo::DOM. However, I have a problem where instead of Mojo::DOM elements, I get a string after using ->find() on an existing element.
I have:
my $pagination_elements = $dom->find("div[class*=\"pagination-block\"]");
my $page_counter_text = $pagination_elements->find("div[class=\"page-of-pages\"]")->text();
$page_counter_text =~ /^Page (\d+) of (\d+)$/;
my $current_page = int($1);
my $last_page = int($2);
my $prev_next_elements = $pagination_elements->find("a[class*=\"prev-next\"]");
my $next_page_link = $prev_next_elements->last();
my $next_page_url = $next_page_link->attr("href");
On each page, there may be 2 link tags with a class of prev-next. Instead of getting the link for the last element, what I get is a string that contains the href for both of the tags (if both are available on the page).
Now, if instead of this I do:
my $next_page_link = $dom->find("div[class*=\"pagination-block\"] > ul > li > a[class*=\"prev-next\"]")->last();
my $next_page_url_rel = $next_page_link->attr("href");
I get the required link.
My question is, why does the second version work and not the first? Why do I have to start from the root DOM element to get a list of elements, and why starting from a child of the root returns a string containing all the link tags instead of just the one I want?
Edit
An example of the HTML I am parsing is:
<div class="pagination-block clearfix">
<div class="page-of-pages">
Page 2 of 100
</div>
<ul class="pagination-links">
<li>
.
.
.
</li>
<li>
<a class="page-option prev-next" href="PREV LINK">Prev</a>
</li>
<li>
<a class="page-option prev-next" href="NEXT LINK">Next</a>
</li>
</ul>
</div>
It would have helped a lot if you could have shown an example of the HTML you are processing. Instead I have imagined this, which I hope is close.
<html>
<head>
<title>Title</title>
</head>
<body>
<div class="pagination-block">
<div class="page-of-pages">Page 99 of 100</div>
<ul>
<li>
<a class="prev-next" href="/page98">Prev</a>
</li>
<li>
<a class="prev-next" href="/page100">Next</a>
</li>
<ul>
</div>
<div class="pagination-block">
<div class="page-of-pages">Page 99 of 100</div>
<ul>
<li>
<a class="prev-next" href="/page98">Prev</a>
</li>
<li>
<a class="prev-next" href="/page100">Next</a>
</li>
<ul>
</div>
</body>
</html>
Now let's look at your code
my $pagination_elements = $dom->find('div[class*="pagination-block"]')
This gives you a Mojo::Collection containing the two instances of div that have a pagination-block class.
my $prev_next_elements = $pagination_elements->find('a[class*="prev-next"]')
This does something like a map, replacing each member of the Mojo::Collection with the results of doing a find on them. Since find returns another Mojo::Collection, you now have a collection of two collections, each with two Mojo::DOM objects. To clarify
$prev_next_elements is a Mojo::Collection object with a size of 2
Both $prev_next_elements->[0] and $prev_next_elements->[1] are Mojo::Collection objects, each with a size of 2
$prev_next_elements->[0][0], $prev_next_elements->[0][1], $prev_next_elements->[1][0], and $prev_next_elements->[1][1] are all Mojo::DOM objects, each containing an <a> element from the HTML document
my $next_page_link = $prev_next_elements->last
This takes the second element of $prev_next_elements. It is the same as $prev_next_elements->[1], and so is a Mojo::Collection object containing the two Mojo::DOM elements that hold the last two <a> elements in the HTML document.
my $next_page_url = $next_page_link->attr('href')
Now you are doing another map operation: applying attr to both elements of the collection, and returning another collection containing the two href strings /page98 and /page100. Stinrgifying this Mogo::Collection just concatenates all of its elements and gives you "/page98\n/page100".
To fix all this, take the last of the $pagination_elements, giving you a Mojo::DOM object. Then do a find for the prev and next elements, giving you Mojo::Collection of the "prev" and
"next" <a> elements, and finally map those elements to links using attr('href'). You end up with Mojo::Collection containing the href text of the "prev" and "next" links in the last pagination block.
my $pagination_elements = $dom->find('div[class*="pagination-block"]');
my $last_pagination_element = $pagination_elements->last;
my $prev_next_elements = $last_pagination_element->find('a[class*="prev-next"]');
my $prev_next_links = $prev_next_elements->attr('href');
my ($prev_page_link, $next_page_link) = ($prev_next_links->first, $prev_next_links->last);
say $prev_page_link;
say $next_page_link;
output
/page98
/page100
You can collapse all that to something more convenient, like this
my $pagination_elements = $dom->find('div[class*="pagination-block"]');
my $prev_next_links = $pagination_elements->last->find('a[class*="prev-next"]')->attr('href');
my ($prev_page_link, $next_page_link) = #$prev_next_links;
say $prev_page_link;
say $next_page_link;
If you used Data::Dump (or some equivalent module) instead of print, you would get a clue as to what's going on:
use Data::Dump;
dd $next_page_url;
dd $next_page_url_rel;
Outputs:
bless(["PREV LINK", "NEXT LINK"], "Mojo::Collection")
"NEXT LINK"
As you can see, your first variable actually holds a collection, and not a string.
The problem arises because the Mojo::DOM->find returns a Mojo::Collection:
my $pagination_elements = $dom->find('div[class*="pagination-block"]');
Doing a subsequent find on a collection returns you a nested collection which is not going to perform the way you expect with calls like last.
Here are three different solutions to fix your first attempt to find the link text:
Use the Mojo::DOM->at method to find the first element in DOM structure matching the CSS selector.
my $pagination_elements = $dom->at('div[class*="pagination-block"]');
Use Mojo::Collection->first or ->last to isolate a specific element in the collection before the subsequent find.
my $pagination_elements
= $dom->find('div[class*="pagination-block"]')->last();
Use Mojo::Collection->flatten to flatten the nested collections created by your subsequent find into a new collection with all elements:
my $pagination_elements = $dom->find('div[class*="pagination-block"]');
my $prev_next_elements
= $pagination_elements->find('a[class*="prev-next"]')->flatten();
All of these methods will make your script work as you intended:
use strict;
use warnings;
use Mojo::DOM;
use Data::Dump;
my $dom = Mojo::DOM->new(do { local $/; <DATA> });
# Fix 1
my $pagination_elements = $dom->at('div[class*="pagination-block"]');
# Fix 2
#my $pagination_elements
# = $dom->find('div[class*="pagination-block"]')->last();
# Fix 3
#my $pagination_elements = $dom->find('div[class*="pagination-block"]');
#my $prev_next_elements
# = $pagination_elements->find('a[class*="prev-next"]')->flatten();
my $prev_next_elements = $pagination_elements->find('a[class*="prev-next"]');
my $next_page_link = $prev_next_elements->last();
my $next_page_url = $next_page_link->attr("href");
dd $next_page_url;
$next_page_link = $dom->find('div[class*="pagination-block"] > ul > li > a[class*="prev-next"]')->last();
my $next_page_url_rel = $next_page_link->attr("href");
dd $next_page_url_rel;
__DATA__
<html>
<head>
<title>Paging Example</title>
</head>
<body>
<div class="pagination-block clearfix">
<div class="page-of-pages">
Page 2 of 100
</div>
<ul class="pagination-links">
<li>
.
.
.
</li>
<li>
<a class="page-option prev-next" href="PREV LINK">Prev</a>
</li>
<li>
<a class="page-option prev-next" href="NEXT LINK">Next</a>
</li>
</ul>
</div>
</body>
</html>
Outputs:
"NEXT LINK"
"NEXT LINK"

How to use "this" and not "this" selectors in jQuery

I have 4 divs with content like below:
<div class="prodNav-Info-Panel">content</div>
<div class="prodNav-Usage-Panel">content</div>
<div class="prodNav-Guarantee-Panel">content</div>
<div class="prodNav-FAQ-Panel">content</div>
And a navigation list like this:
<div id="nav">
<ul id="navigation">
<li><a class="prodNav-Info" ></a></li>
<li><a class="prodNav-Usage" ></a></li>
<li><a class="prodNav-Guarantee"></a></li>
<li><a class="prodNav-FAQ" ></a></li>
</ul>
</div>
When the page is first displayed I show all the content by executing this:
$('div.prodNav-Usage-Panel').fadeIn('slow');
$('div.prodNav-Guarantee-Panel').fadeIn('slow');
$('div.prodNav-FAQ-Panel').fadeIn('slow');
$('div.prodNav-Info-Panel').fadeIn('slow');
Now, when you click the navigation list item it reveals the clicked content and hides the others, like this:
$('.prodNav-Info').click( function() {
$('div.prodNav-Info-Panel').fadeIn('slow');
$('div.prodNav-Usage-Panel').fadeOut('slow');
$('div.prodNav-Guarantee-Panel').fadeOut('slow');
$('div.prodNav-FAQ-Panel').fadeOut('slow');
});
So what I have is 4 separate functions because I do not know which content is currently displayed. I know this is inefficient and can be done with a couple of lines of code. It seems like there is a way of saying: when this is clicked, hide the rest.
Can I do this with something like $(this) and $(not this)?
Thanks,
Erik
In your particular case you maybe able to use the .sibilings() method something like this:
$(this).fadeIn().sibilings().fadeOut()
Otherwise, lets say that you have a set of elements stored somewhere that points to all of your elements:
// contains 5 elements:
var $hiders = $(".prodNavPanel");
// somewhere later:
$hiders.not("#someElement").fadeOut();
$("#someElement").fadeIn();
Also, I would suggest changing the classes for your <div> and <a> to something more like:
<div class="prodNavPanel" id="panel-Info">content</div>
....
<a class="prodNavLink" href="#panel-Info">info</a>
This gives you a few advantages over your HTML. First: the links will have useful hrefs. Second: You can easily select all your <div>/<a> tags. Then you can do this with jQuery:
$(function() {
var $panels = $(".prodNavPanel");
$(".prodNavLink").click(function() {
var m = this.href.match(/(#panel.*)$/);
if (m) {
var panelId = m[1];
$panels.not(panelId).fadeOut();
$(panelId).fadeIn();
return false; // prevents browser from "moving" the page
}
});
});