DOM xpath html extraction

DOM xpath html extraction - dom

I am importing some html into a dom document ans use xpath to extract the html part I am interested in. See below:
$dom = new DOMDocument();
#$dom->loadHTML('$myHtmlFileHere');
$xpath_dom_doc = new DomXPath($dom);
$dom_object = $dom_document->query('myPathHere');
Here below the html structure "returned":
<div>GROUP A</div>
<span>aaa</span>
<span>zzz awesome</span>
<span>eee</span>
<div>GROUP B</div>
<span>fff</span>
<div>GROUP C</div>
<span>zzz</span>
<span>uuu</span>
<span>iii</span>
<span>rrr</span>
As you see I have categories (GROUP A, GROUP B and GROUP C). In the spans below each category I have information related to the category. What I would like is send to a db the span content with the related category. The problem that I am confronted to is that the div tag of the category is not wrapping the spans. So I do not see how I can manage that. hope someone can help. thank you in advance. Cheers. Marc

What about the following-sibling xpath function? You should be careful, though, and only select the siblings up to a next div.
For example, tested with xsh2:
$div = //div[2] ;
ls $div/following-sibling::span[count(preceding-sibling::div)=1+count($div/preceding-sibling::div)] ;

Related

Is it possible to find and store element's location by text in selenium ide?

I need to create the element and then delete it. Is there a way to find the element by it's text after it was created?
The xpath of the element is //div[#id='mif-tree-6']/span/span[3].

You can use xpath for it for example. Like:
//div[#id='mif-tree-6']//span[contains(text(),'your_text_here')]
UPDATE
Please provide an example of your html. It is possible to find a parent of your element with xpath and after that to find all the childs. For example your html =
<div id='lol'>
<div>first_item</div>
<div>second_item</div>
<div>third_element</div>
</div>
You get an array of elements with xpath =
//div[contains(text(),'first_')]/../div
So you can do something like:
click | //div[contains(text(),'first_')]/../div[2]
BUT if there are a lot of brothers-elements to find by text of one sibling it will be necessary to use loop to get every of them.
Once again. If you will provide full information about what are you doing and an example of your html it will be much easier to suggest.

Need to find the tags under a tag in an XML using jQuery

I have this xml as part of the responseXml of an Ajax call:
<banner-ad>
<title><span style="color:#ffff00;"><strong>Title</strong></span></title>
</banner-ad>
When I used this jQuery(responseXml).find("title").text(); the result is "Title".
I also tried jQuery(responseXml).find("title:first-child") but the result is [object Object].
I want to get the result:
<span style="color:#ffff00;"><strong>Title</strong></span>
Please let me know how to do this in jQuery.
Thanks in advance for any help.
Regards,
Racs

Your problem is that you cannot simply append nodes from one document (the XML response) to another (your HTML page). The issue is two-fold:
You can use jQuery to append nodes from the XML document to the HTML page. This works; the nodes appear in the HTML DOM, but they stay XML nodes and therefore the browser ignores the style attribute, for example. Consequently the text will not be yellow (#ffff00).
As far as I can see, jQuery offers no built-in way to get the XML string (i.e. a serialized node) from an XML node. jQuery can handle XML documents quite well, but there is no equivalent to what .html() does in HTML documents.
So to make this work we need to extract the XML string from the XML document. Some browsers support the .xml property on XML nodes (namely, IE), the others come with an XMLSerializer object:
// find the proper XML node
var $title = $(doc).find("title");
// either use .xml or, when unavailable, an XMLSerializer
var html = $title[0].xml || (new XMLSerializer()).serializeToString($title[0]);
// result:
// '<title><span style="color:#ffff00;"><strong>Title</strong></span></title>'
Then we have to feed this HTML string to jQuery so new, real HTML elements can be created from it:
$("#target").append(html);
There is a fiddle to show this in action: http://jsfiddle.net/Tomalak/QWHj8/. This example also gets rid of the superfluous <title> element.
Anyway. If you have a chance to influence the XML itself, it would make sense to change it:
<banner-ad>
<title><span style="color:#ffff00;"><strong>Title</strong></span></title>
</banner-ad>
Just XML-encode the payload of <title> and you can do this in jQuery:
$("#target").append( $(doc).find("title").text() );

This would probably work:
$(responseXml).find("title").html();

Zend_Navigation: How to add numberings in the menu items?

I am generating XML from database records, then feeding it to Zend_Navigation to render it as treeview and before rendering I would like to add the level numbers, like a TOC numberings:
I have:
$partial = array('partials/menu.phtml', 'default');
$this->navigation()->menu()->setPartial($partial);
echo $this->navigation()->menu()->setUlClass('treeview')->render();
The output is dressed with ul/li(I need ul for treeview):
My First Web Page
Nice Page
Main Help
Works
But I Need:
1.My First Web Page
1.1 Nice Page
1.1.1 Main Help
1.2 Works
How can I dress each level with a number?
$navarray=$this->navigation()->menu()->toArray();
$it = new RecursiveIteratorIterator(new RecursiveArrayIterator($navarray[0]), RecursiveIteratorIterator::SELF_FIRST);
foreach ($it as $row) {
/// ????
}
Thanks Arman.

Maybe you could modify the partial to render an ol instead of ul, and then use some CSS magic to render the numbering properly.
You can see the example #48 in the Menu Helper documentation to get some inspiration.
EDIT:
If you need to use the ul tag, then probably you'll need to add the "current depth" of the menu items by hand. There is a very similar question answered here: PHP RecursiveIteratorIterator: Determining first and last item at each branch level.
Hope that helps,

Need to print out all links on a sidebar in selenium (xpath?)

I need to find any extra links and print them out. I started by doing:
get_xpath_count('//li/a')
and comparing it to the size of an array that holds the name of all the links for the sidebar. When the count is too high/low, I need to print out all the extra/missing links. I would like to make a list of the names so I can compare it to the array. I've tried a few things like get_text('//li/a'), which returns the name of the first. get_text('//li/a[1]) does the same, but any other index returns nothing.
Any ideas? Also, I need the name that's displayed on the link, not the actual href.
Edit* Also, i'm pretty new to selenium and Xpath. Please let me know if there's info I let out that is needed, or just any suggestions towards thew way I'm going about this.

I have been able to get this to work using CSS element locators. Since I use CSS selectors far more often than Xpath, I find it easier to always use them with Selenium as well.
$selenium->get_text("css=li a:nth-child(1)")
$selenium->get_text("css=li a:nth-child(2)")
$selenium->get_text("css=li a:nth-child(...)")
$selenium->get_text("css=li a:nth-child(n)")

Use:
(//li/a)[$someNumber]
this will get you the text of $someNumber-th //li/a in the XML document.
In order to know what values to use to substitute the $someNumber with, you need to know the total count of these elements:
count(//li/a)

This is in JAVA. You can use the same concept in perl
int totCountInPage=selenium.getXpathCount(//li/a);
for(int count=1;count<=totCountInPage;count++)
System.out.println(selenium.getText("xpath=//li[count]/a"));
This should print text inside the anchor links under all li tag.

Find and replace variable div contents

I have a php page which contains a large amount of HTML in it. One part of the HTML has a div in the following format:
<div class="reusable-block" id="xyzabcwy">there is a lot of HTML here which may be in any format</div>
Keep in mind, this div is contained within the DOM at any location however, I do know the div ID programatically.
I was originally finding this string within my database, since a record of it exists there however, the format between the data in the database record and the page are sometimes different due to whitespace but other than the white space, the strings are exactly the same. The problem is, I don't know what format the whitespace is in.
It seems it is better to write a regular expression to find this div and replace it entirely.
I could use a hand though.
Other ideas are also welcome.
Many thanks!

If you are using jQuery,
$('#xyzabcwy').html(new_data);
if not
document.getElementById('xyzabcwy').innerHTML = new_data;
otherwise, here is a PHP example.
Edit: PHP
<?php
$id = "xyzabcwy";
$html = "<div id=\"" . $id . "\">this is html</div>";
$newdata = "test";
echo preg_replace("#<div[^>]*id=\"{$id}\".*?</div>#si",$newdata,$html);
?>
 This should output
<div id="123">test</div>
Answer from: Replace a div content with PHP

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

DOM xpath html extraction - dom

What about the following-sibling xpath function? You should be careful, though, and only select the siblings up to a next div. For example, tested with xsh2: $div = //div[2] ; ls $div/following-sibling::span[count(preceding-sibling::div)=1+count($div/preceding-sibling::div)] ;

Related

Is it possible to find and store element's location by text in selenium ide?

Need to find the tags under a tag in an XML using jQuery

Zend_Navigation: How to add numberings in the menu items?

Need to print out all links on a sidebar in selenium (xpath?)

Find and replace variable div contents

Categories

Resources