im trying to scrap a website using dom, but im unable to extract this:
<h1 data-reactid="218">
some info
some info
</h1>
i've been trying this query, but it doesnt work
$allClass = $xpath->query("//h1[#data-reactid='218'");
foreach ($allClass as $urs)
{
foreach ($urs->attributes as $att2)
{
print_r($att2->value);
}
}
Could anyone help me please?
You are missing a closing bracket ] in your xpath expressions.
Replace this line:
$allClass = $xpath->query("//h1[#data-reactid='218'");
With this line:
$allClass = $xpath->query("//h1[#data-reactid='218']");
That will give you:
218
Related
I'm trying to add nodes from one document to a new document I create, but it's not working and I don't know why. Here's the code that's going wrong:
my ($body_node) = $newdoc->findnodes('//body');
my #nodes = $source_doc->findnodes('//div[starts-with(#psname, "xyz")]');
foreach my $node(#nodes) {
$body_node = $body_node->appendChild($node);
}
$newdoc->toFile($outfile);
The code looks for some named div tags and appends them to the body tag. The problem is that it's appending them to the last div tag, not to the body tag so I'm ending up with a bunch of nested divs:
</div></div></div></div></div></div></div></div></div></div></div></div>
</div></div></div></div></div></div></div></div></div></div></div></div></body></html>
If someone could tell me what I'm doing wrong I'd be eternally grateful.
That means you probably need to come back to <body> after adding <div>:
my ($body_node) = $newdoc->findnodes('//body');
my #nodes = $source_doc->findnodes('//div[starts-with(#psname, "xyz")]');
foreach my $node(#nodes) {
$body_node = $body_node->appendChild($node);
($body_node) = $newdoc->findnodes('//body');
}
open (OUT, ">$outfile");
print OUT $newdoc->toString();
close OUT;
This is my typoscript code :
lib.test.renderObj = COA
lib.test.renderObj.10 = TEXT
lib.test.renderObj.10.stdWrap.field = header
lib.test.renderObj.10.stdWrap.case = lower
lib.test.renderObj.10.stdWrap.trim = 1
lib.test.renderObj.10.stdWrap.wrap = <div class="element">|</div>
The header field is well received, letters have been lowercased and the field is well wrapped by a element. The only problem is that i can't make the TRIM properties effective. I also tried to use the search/replace properties -> no success.
Any clue ? :)
You can search and replace since 4.6. The documentation you can find here: https://docs.typo3.org/typo3cms/TyposcriptReference/6.2/Functions/Replacement/
lib.test.renderObj.10.stdWrap.replacement {
10 {
search = # #
replace =
useRegExp = 1
}
}
Don't know if the replace is really working, can't test it.
I am using a partial view to list the top 5 children of a specific node.
This works, but only if I put a div before the
foreach
eg
#inherits Umbraco.Web.Mvc.UmbracoTemplatePage
<div class="title">Test</div>
<ul>
#{
var ow = #owCore.Initialise(1085);
<div> </div>
var node = Umbraco.Content(1105);
foreach (var item in node
.Children.Where("Visible")
.OrderBy("Id descending")
.Take(5)
)
{
<li>#item.pageTitle</li>
}
}
</ul>
produces the expected unsorted list.
However, if I remove the empty div
#inherits Umbraco.Web.Mvc.UmbracoTemplatePage
Test
<ul>
#{
var ow = #owCore.Initialise(1085);
var node = Umbraco.Content(1105);
foreach (var item in node
.Children.Where("Visible")
.OrderBy("Id descending")
.Take(5)
)
{
<li>#item.pageTitle</li>
}
}
The error I get is
Compiler Error Message: CS1513: } expected
Source Error:
Line 113: } Line 114: } Line 115:}
Clear looks like too few closing '}'
Presumably the div forces the closing }?
I have checked owCore (it's a library of functions I am building in App_Code : however, I have stripped this back and it's now doing nothing just to make sure there are matched curly brackets:
#using Umbraco
#using Umbraco.Core.Models
#using Umbraco.Web
#functions{
public static int Initialise(int siteDocID){
return 0;
}
}
However, if I remove the #owCore code from the partial view
#inherits Umbraco.Web.Mvc.UmbracoTemplatePage
Test
<ul>
#{
var node = Umbraco.Content(1105);
foreach (var item in node
.Children.Where("Visible")
.OrderBy("Id descending")
.Take(5)
)
{
<li>#item.pageTitle</li>
}
}
</ul>
All is ok again.
Does that mean it's definitely an issue with the owCore or simply something else tripping the issue with mismatched {}
I have checked the template calling this partial view and can't find a problem.
This doesn't make sense. Can anyone explain?
Thanks!
This is actually more of a razor question.
You start your code block with #{ and by doing that you don't need the # in front of owCore. Removing it will make it render even without the <div> as the razor parser is no longer confused by the #.
I have a javascript which contains something like this
var f = function(){
if(){
}else{
}
}
and I need to add ; at the end, like this
var f = function(){
if(){
}else{
}
};
I need some help to get the closing function } tag and not the } tags from inside the function.
Thanks
You can't, if all you can use is regex. It'd be very much like parsing HTML with regex.
Instead, see if you can find an AST parser for javascript in PHP - that's what you'll need to be able to find the appropriate closing bracket.
C# and word interop,
I have a word document with some textboxs (msoTextBox shapes), the problem that I can't iterate through the shapes collection with the code below :
foreach (Shape shape in WordDocument.Shapes)
{}
although when setting a breakpoint in the loop line I can see that WordDocument.Shapes.Count returns 4.
I note that textboxs are inserted using open xml sdk.
I've found there's a problem when textboxes are used. Take a look at this solution.
From Code Project :
// Get the word count from all shapes
foreach (Word.Shape shape in wordDocument.Shapes)
{
if (shape.TextFrame.HasText < 0)
{
count+=GetCountFromRange(shape.TextFrame.TextRange,wordDocument,word);
}
}
From what you said, you look like you do the right thing.
Can you give us the Error StackTrace ?
PS : I know my question should have been in the comments, but it wouldn't have been readable :)
So,
Replace :
foreach (Shape shape in WordDocument.Shapes)
{
}
By:
foreach (Range rangeStory in WordDocument.StoryRanges)
{
foreach (Shape shape in rangeStory.ShapeRange)
{
}
}
It's work perfectly.