BeautifulSoup: get tag name of element itself, not its children - tags

I have the below (simplified) code, which uses the following source:
<html>
<p>line 1</p>
<div>
<a>line 2</a>
</div>
</html>
soup = BeautifulSoup('<html><p>line 1</p><div><a>line 2</a></div></html>')
ele = soup.find('p').nextSibling
somehow_print_tag_of_ele_here
I want to get the tag of ele, in this case "div". However, I only seem to be able to get the tag of its children. Am I missing something simple? I thought that I could do ele.tag.name, but that is an exception since tag is None.
#Below correctly prints the div element "<div><a>line 2</a></div>"
print ele
#Below prints "None". Printing tag.name is an exception since tag is None
print ele.tag
#Below prints "a", the child of ele
allTags = ele.findAll(True)
for e in allTags:
print e.name
At this point, I am considering doing something along the way of getting the parent of ele, then getting the tags of parent's children and, having counted how many upper siblings ele has, counting down to the correct child tag. That seems ridiculous.

ele is already a tag, try doing this:
soup = BeautifulSoup('<html><p>line 1</p><div><a>line 2</a></div></html>')
print(soup.find('p').nextSibling.name)
so in your example it would be just
print(ele.name)

You can access anything inside an element as if accessing a dictionary.
Let's say you have an element like this one.
<input id="__VIEWSTATE3" name="__VIEWSTATE3" type="hidden" value="MwqzeTH4"/>
You can access each property like this
print(elem["id"])
# prints __VIEWSTATE3

print(soup.find('h1',id_='pdp_product_title'))
it doesnot print any detail please solved this
<h1 id="pdp_product_title" class="headline-2 css-zis9ta" data-test="product-title">Nike Air Force 1 Shadow</h1>

Related

Getting all tag elements in an array with SwiftSoup

I worked on a project in python using BeautifulSoup for parsing an Html doc and adding ruby and rt tags to each string. Recently I've been working on a similar project for a personal IOS app. I found SwiftSoup which was similar but ran into a problem parsing a tag which I was able to do beautifully using BeautifulSoup. In Beautiful soup I am able to get a tag like the one below
<p id="p6" data-pid="6" data-rel-pid="[41]" class="p6">
<span class="parNum" data-pnum="1"></span>
This is a(<span id="citationsource2"></span><a epub:type="noteref" href="#citation2">link</a>)to some website。
</p>
using .content from BS4 I am able to get the tags into an array like this
['\n', <span class="parNum" data-pnum="1"></span>, '\n This is a(', <span id="citationsource2"></span>, <a epub:type="noteref" href="#citation2">link</a>, ')to some website。\n ']
After i go through the array and check if the children tags have text or if the element in the array is a text element and i just append the ruby tags. The result is this
<p id="p6" data-pid="6" data-rel-pid="[41]" class="p6">
<span class="parNum" data-pnum="1"></span>
<ruby>This<rt>1</rt></ruby><ruby>is<rt>2</rt></ruby> <ruby>a<rt>3</rt></ruby>(<span id="citationsource2"></span><a epub:type="noteref" href="#citation2"><ruby>link<rt>4</rt></ruby></a>)<ruby>to<rt>5</rt></ruby> <ruby>some<rt>6</rt></ruby> <ruby>website<rt>7</rt></ruby>。
</p>
With SwiftSoup I parse the Document doing this since it doesn't have a similar method like the BS4 .content
let soup:Document = try! SwiftSoup.parse(html)
let elements:Elements = try! soup.select("p")
for j in try! elements.html(){
print(try! j)
//Doesn't work prints out every single character not every element
}
The problem is that it treats the whole content of the p tag as an element it doesnt separate the elements in the p tag like BS4 does. I looked at the documentation but I don't see anything about separating the elements from the tag into an array.
This is what I want to achieve with Swiftsoup
['\n', <span class="parNum" data-pnum="1"></span>, '\n This is a(', <span id="citationsource2"></span>, <a epub:type="noteref" href="#citation2">link</a>, ')to some website。\n ']
But end up getting everything as one element in the array instead of seperated elements.
[<span class="parNum" data-pnum="1"></span>This is a(<span id="citationsource2">
</span> <a epub:type="noteref" href="#citation2">link</a>)to some website.]
Is there any way of achieving this using swiftsoup or another swift html parser that could achieve the same thing?
After looking at the SwiftSoup files I was able to find the answer to my question. SwiftSoup has a method called getChildNodes which allows you to get all the content of the specified tag. It returns an array of the content of the tag. Hope this helps anyone who has also faced a similar problem.
let soup:Document = try! SwiftSoup.parseBodyFragment(html)
let p : Elements = try! soup.select("p")
for j in p{
print(try! j.getChildNodes())
}}

Regex: Capture Groups and Empty Fields (SWIFT 5 | ICU Regex Engine)

I am in need of some help correcting my RegEx string - I have a string of text (A large body of HTML) and I need to take this HTML String and then pattern match it so that data that I have nested within' <div> tags can be extracted and used.
Lets take an example with a test case of <div id=1>
<div id=1>UID:1currentPartNumber:63222TRES003H1workcenter:VLCSKDcycleTime:98.8curPartCycleTime:63.66partsMade:233curCycleTimeActual:62.4target:291actual:233downtime:97statusReason:lineStatus:Productionefficiency:80.05plusminus:-260curProdTime:7/16/2019 12:28:01 PM</div>
What should be noted is that lineStatus can either have a value or be empty such as the same with statusReason
I am able to come up with a regex that does MOST of the work but I am struggling with cases where values are not present.
Here is my attempt:
(
(<div id=(\d|\d\d)>)
(UID:(\d|\d\d))
(currentPartNumber:(.{1,20}))
(workcenter:(.{1,20}))
(cycleTime:(.{1,6}))
(curPartCycleTime:(.{1,6}))
(partsMade:(.{1,6}))
(CycleTimeActual:(.{1,6}))
(target:(.{1,6}))
(actual:(.{1,6}))
(downtime:(.{1,6}))
((statusReason:((?:.)|(.{1,6}))))
((lineStatus:((?:.)|(.{1,6}))))
(Productionefficiency:(.{1,6}))
(plusminus:(.{1,6}))
(curProdTime:(.{1,30}))
)
Split it up just for readability.
Thanks,
You are very, very close.
If you use:
(
(<div id=\d{1,2}>)
(UID:\d{1,2})
(currentPartNumber:(.{1,20}))
(workcenter:(.{1,20}))
(cycleTime:(.{1,6}))
(curPartCycleTime:(.{1,6}))
(partsMade:(.{1,6}))
(CycleTimeActual:(.{1,6}))
(target:(.{1,6}))
(actual:(.{1,6}))
(downtime:(.{1,6}))
(statusReason:(.{0,6}))
(lineStatus:(.{0,6}))
(Productionefficiency:(.{1,6}))
(plusminus:(.{1,6}))
(curProdTime:(.{1,30}))
(<\/div>)
)
Then $3\n$4\n$6\n$8\n$10\n$12\n$14\n$16\n$18\n$20\n$22\n$24\n$26\n$28\n$30 will be:
UID:1
currentPartNumber:63222TRES003H1
workcenter:VLCSKD
cycleTime:98.8
curPartCycleTime:63.66
partsMade:233cur
CycleTimeActual:62.4
target:291
actual:233
downtime:97
statusReason:
lineStatus:
Productionefficiency:80.05
plusminus:-260
curProdTime:7/16/2019 12:28:01 PM
By using (statusReason:(.{0,6}))(lineStatus:(.{0,6})) you make the value of statusReason and lineStatus truly optional.
I also simplified the start <div> and UID detection.
Try Regex: ((<div id=(\d|\d\d)>)(UID:(\d|\d\d))(currentPartNumber:(.{1,20}))(workcenter:(.{1,20}))(cycleTime:(.{1,6}))(curPartCycleTime:(.{1,6}))(partsMade:(.{1,6}))(CycleTimeActual:(.{1,6}))(target:(.{1,6}))(actual:(.{1,6}))(downtime:(.{1,6}))(statusReason:(.{1,6})?)(lineStatus:(.{1,6})?)(Productionefficiency:(.{1,6}))(plusminus:(.{1,6}))(curProdTime:(.{1,30})))
Demo
Warning: You can't Parse HTML with regex

Umbraco 7 Mismatched { and }?

I am using a partial view to list the top 5 children of a specific node.
This works, but only if I put a div before the
foreach
eg
#inherits Umbraco.Web.Mvc.UmbracoTemplatePage
<div class="title">Test</div>
<ul>
#{
var ow = #owCore.Initialise(1085);
<div> </div>
var node = Umbraco.Content(1105);
foreach (var item in node
.Children.Where("Visible")
.OrderBy("Id descending")
.Take(5)
)
{
<li>#item.pageTitle</li>
}
}
</ul>
produces the expected unsorted list.
However, if I remove the empty div
#inherits Umbraco.Web.Mvc.UmbracoTemplatePage
Test
<ul>
#{
var ow = #owCore.Initialise(1085);
var node = Umbraco.Content(1105);
foreach (var item in node
.Children.Where("Visible")
.OrderBy("Id descending")
.Take(5)
)
{
<li>#item.pageTitle</li>
}
}
The error I get is
Compiler Error Message: CS1513: } expected
Source Error:
Line 113: } Line 114: } Line 115:}
Clear looks like too few closing '}'
Presumably the div forces the closing }?
I have checked owCore (it's a library of functions I am building in App_Code : however, I have stripped this back and it's now doing nothing just to make sure there are matched curly brackets:
#using Umbraco
#using Umbraco.Core.Models
#using Umbraco.Web
#functions{
public static int Initialise(int siteDocID){
return 0;
}
}
However, if I remove the #owCore code from the partial view
#inherits Umbraco.Web.Mvc.UmbracoTemplatePage
Test
<ul>
#{
var node = Umbraco.Content(1105);
foreach (var item in node
.Children.Where("Visible")
.OrderBy("Id descending")
.Take(5)
)
{
<li>#item.pageTitle</li>
}
}
</ul>
All is ok again.
Does that mean it's definitely an issue with the owCore or simply something else tripping the issue with mismatched {}
I have checked the template calling this partial view and can't find a problem.
This doesn't make sense. Can anyone explain?
Thanks!
This is actually more of a razor question.
You start your code block with #{ and by doing that you don't need the # in front of owCore. Removing it will make it render even without the <div> as the razor parser is no longer confused by the #.

How to add links in wickets

In java class i need to add list (E.g. list.getFirstName()) to the Label in wicket and that first name shoulb be a hyper link in the html. Below i have code of java and html
the html code
<tr >
<a wicket:id="gotoClass">
<span wicket:id="firstname"></span>
</a>
</tr>
the java class
Iterator<String> brds = list.iterator();
RepeatingView repeating = new RepeatingView("repeating");
add(repeating);
while (brds.hasNext())
{
AbstractItem item = new AbstractItem(repeating.newChildId());
repeating.add(item);
String contact = brds.next();
item.add(new Label("firstname", contact));
}
The above code works for me and i am able to add the label i.e if i have 10 first names in the list i am able to add 10 labels in html.But i try to add the anchor tag in html and
form.add(new BookmarkablePageLink<String>("firstname", gotoClass.class)); in java
then i get the below exception
Last cause: Unable to find component with id 'firstname' in [ [Component id = formname]]
Expected: 'formname.firstname'.
Found with similar names: ''
can anybody help me on this
Regards
Sharath
The link element must be son of the repeating element (as you did firstly). For example:
//...
BookmarkablePageLink<String> link = new BookmarkablePageLink<String>("firstname", gotoClass.class);
item.add(link);
link.add(new Label("firstname", contact));

How to get the value of the data item to use in {{if}}?

I am trying to get the value of the data item to use in a {{if}} tag, but cannot get it to work. So, the question is how do we get that value? You can see full code here http://jsfiddle.net/epitka/BhYvh/
<script id="contentHeaderTemplate" type="text/x-jquery-tmpl">
{{if($data.Id===1)}} Create New Order{{else}}Edit Order {{/if}}
<br />
</script>
Drop the parentheses.
{{if $data.Id === 1 }}
http://jsfiddle.net/mattball/KdqZF/
See also the {{if}} template tag API page.
You don't need the parentheses in that if statement. Try:
{{if $data.Id === 1 }}