domxpath- how to get content for parent tag only instead of child tags - dom

I, am using domxpath query for fetching content for parent tag only that is (td[class='s']) instead of including div content which is nested inside that td as given below in my code.
<?php
$second_trim='<td class="s" style="line-height:18px;">THIS TEXT IS REQUIRED and <div id="a" style="display:none;background-color:black;border:1px solid #ddd;padding:5px;color:black;">THIS TEXT IS NOT REQUIRED </div></td>';
$dom = new DOMDocument();
$doc->validateOnParse = true;
#$dom->loadHTML($second_trim);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$b = $xpath->query('//td[#class="s"]');
echo "<p style='font-size:14px;color:red;'><b style='font-size:18px;color:gray;'>cONTENT :- </b>".$b->item(0)->nodeValue."</p>";
?>
so how to remove content of that div tag and fetching only td's content any ideas !!

EDIT
If you are only interested in the direct text content modify your xpath query:
$b = $xpath->query('//td[#class="s"]/text()');
echo '<p style="font-size:14px;color:red;">'
.'<b style="font-size:18px;color:gray;">cONTENT :- </b>'
.$b->item(0)->nodeValue
.'</p>';
Right now the result is very specific to the example:
If more than one direct text node exists, its not gone be displayed. To do that foreach through the DOMNodeList $b and echo every selected node value.

Related

Remove the display:none attribute so that the item will be visible

I need to remove the display:none attribute so that the item will be visible.
It is similar to remove attribute display:none; so the item will be visible although I am using Powershell with Selenium.
The textarea Element is:
<textarea id="response" name="response" class="response" style="display: none;"></textarea>
I need to display this text area.
No luck with any of this commands:
$TextArea_Element.show()
$TextArea_Element.displayed.Clear()
$TextArea_Element.sendKeys("displayed".DELETE)
If I do a $TextArea_Element.displayed I get a value of "False"
Here is my Powershell code:
$browser = Start-SeChrome
$url = "somesite.com"
$browser.Navigate().GoToURL($url)
ForEach ($TextArea_Element in (Find-SeElement -Driver $browser -TagName TextArea))
{
$TextArea_Element
$TextArea_Element.displayed.Clear()
#$TextArea_Element.sendKeys("displayed".DELETE)
}
Please help.
Thanks
Try to edit the style of element to display='block'.
You can apply JavaScript to element as below:
$browser.ExecuteScript("arguments[0].style.display='block';", $TextArea_Element)

DOMXPath multiple contain selectors not working

I have the following XPath query that a kind user on SO helped me with:
$xpath->query(".//*[not(self::textarea or self::select or self::input) and contains(., '{{{')]/text()") as $node)
Its purpose is to replace certain placeholders with a value, and correctly catches occurences such as the below that should not be replaced:
<textarea id="testtextarea" name="testtextarea">{{{variable:test}}}</textarea>
And replaces correctly occurrences like this:
<div>{{{variable:test}}}</div>
Now I want to exclude elements that are of type <div> that contain the class name note-editable in that query, e.g., <div class="note-editable mayhaveanotherclasstoo">, in addition to textareas, selects or inputs.
I have tried:
$xpath->query(".//*[not(self::textarea or self::select or self::input) and not(contains(#class, 'note-editable')) and contains(., '{{{')]/text()") as $node)
and:
$xpath->query(".//*[not(self::textarea or self::select or self::input or contains(#class, 'note-editable')) and contains(., '{{{')]/text()") as $node)
I have followed the advice on some questions similar to this: PHP xpath contains class and does not contain class, and I do not get PHP errors, but the note-editable <div> tags are still having their placeholders replaced.
Any idea what's wrong with my attempted queries?
EDIT
Minimum reproducible DOM sample:
<div class="note-editing-area">
<textarea class="note-codable"></textarea>
<div class="note-editable panel-body" contenteditable="true" style="height: 350px;">{{{variable:system_url}}</div>
</div>
Code that does the replacement:
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
foreach ($xpath->query(".//*[not(self::textarea or self::select or self::input or self::div[contains(#class,'note-editable')]) and contains(., '{{{')]/text()") as $node) {
$node->nodeValue = preg_replace_callback('~{{{([^:]+):([^}]+)}}}~', function($m) use ($placeholders) {
return $placeholders[$m[1]][$m[2]] ?? '';
},
$node->nodeValue);
}
$html = $dom->saveHTML();
echo html_entity_decode($html);
Use this below xpath.
.//*[not(self::textarea or self::select or self::input or self::div[contains(#class,'note-editable')]) and contains(., '{{{')]

Find and extract content of division of certain class using DomXPath

I am trying to extract and save into PHP string (or array) the content of a certain section of a remote page. That particular section looks like:
<section class="intro">
<div class="container">
<h1>Student Club</h1>
<h2>Subtitle</h2>
<p>Lore ipsum paragraph.</p>
</div>
</section>
And since I can't narrow down using class container because there are several other sections of class "container" on the same page and because there is the only section of class "intro", I use the following code to find the right division:
$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
#$doc->loadHTMLFile("https://www.remotesite.tld/remotepage.html");
$finder = new DomXPath($doc);
$intro = $finder->query("//*[contains(#class, 'intro')]");
And at this point, I'm hitting a problem - can't extract the content of $intro as PHP string.
Trying further the following code
foreach ($intro as $item) {
$string = $item->nodeValue;
echo $string;
}
gives only the text value, all the tags are stripped and I really need all those divs, h1 and h2 and p tags preserved for further manipulation needs.
Trying:
foreach ($intro->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo $name;
echo $value;
}
is giving the error:
Notice: Undefined property: DOMNodeList::$attributes in
So how could I extract the full HTML code of the found DOM elements?
I knew I was so close... I just needed to do:
foreach ($intro as $item) {
$h1= $item->getElementsByTagName('h1');
$h2= $item->getElementsByTagName('h2');
$p= $item->getElementsByTagName('p');
}

PHP DOMDocument - match and remove URLs

I'm trying to extract links from html page using DOM:
$html = file_get_contents('links.html');
$DOM = new DOMDocument();
$DOM->loadHTML($html);
$a = $DOM->getElementsByTagName('a');
foreach($a as $link){
//echo out the href attribute of the <A> tag.
echo $link->getAttribute('href').'<br/>';
}
Output:
http://dontwantthisdomain.com/dont-want-this-domain-name/
http://dontwantthisdomain2.com/also-dont-want-any-pages-from-this-domain/
http://dontwantthisdomain3.com/dont-want-any-pages-from-this-domain/
http://domain1.com/page-X-on-domain-com.html
http://dontwantthisdomain.com/dont-want-link-from-this-domain-name.html
http://dontwantthisdomain2.com/dont-want-any-pages-from-this-domain/
http://domain.com/page-XZ-on-domain-com.html
http://dontwantthisdomain.com/another-page-from-same-domain-that-i-dont-want-to-be-included/
http://dontwantthisdomain2.com/same-as-above/
http://domain3.com/page-XYZ-on-domain3-com.html
I would like to remove all results matching dontwantthisdomain.com, dontwantthisdomain2.com and dontwantthisdomain3.com so the output will looks like that:
http://domain1.com/page-X-on-domain-com.html
http://domain.com/page-XZ-on-domain-com.html
http://domain3.com/page-XYZ-on-domain3-com.html
Any ideas? :)
I think you should use regular expression.Google it and have fun

Is it possible with jquery to use the nth-child() selector on an 'a' or an 'a:hover', not just 'li'?

Ive created a navigation bar where the hover state of each link has be a different color so im trying to select the a:hover states with jquerys nth-child() selector. i can get it to select the li element but not the a or the a:hover. Currently all the hovers are blue.
here is the jquery code im trying to use:
jQuery(document).ready(function() {
jQuery('#leftbar li:nth-child(3)').css('border-bottom', '#000000 5px solid');
});
Hi the navigation is generated with php, here it is:
<ul id="leftbar">
<?php
$pagepath = "content/pages/";
$legalpath = "content/legals/";
$mainnavpath = "content/.system-use/navigation/";
$mainnavfile = $mainnavpath."mainnav.inc";
if (file_exists($mainnavfile)) {
require $mainnavfile;
sort ($mainfiles);
for($i=0; $i<count($mainfiles); $i++)
{
if (!preg_match("/XX-/",$mainfiles[$i])) {
$displayname = preg_replace("/\.inc/i", "", $mainfiles[$i]);
$displayname = substr($displayname, 3);
echo "<li>";
echo "<a ";
if ($page==$displayname) {echo ' class="active"';} else {echo ' class="prinav"';}
echo "title='$displayname' href='";
if ($useredirect=="yes"){echo '/'.$displayname.'/';} else {echo '/index.php?page='.$displayname;}
echo"' ";
echo "><span>$displayname</span></a></li>\n";
}}
}
else { echo "<strong>No Navigation - Please Login to your Admin System and set the Page Order</strong>"; }
?>
here is the site im working on:
http://entourageuk.com/
Cheers!
Paul
You can't select using a CSS pseudo selector like :hover, but yes, you can select an <a> element.
Whether :nth-child is appropriate depends on your markup. I'm going to assume that each <a> is a child of the <li> elements you're selecting.
If that's the case, then you would just add a to the selector.
jQuery('#leftbar li:nth-child(3) > a').css(...
This uses the > child selector, and is basically saying that I want the <a> element(s) that is a direct child of the <li> element(s) that is the third child of its container and is a descendant of leftbar.