How to begin using HTML DOM - dom

I have trouble understanding how some things are related.
For a Wordpress plugin, I would like to use HTML DOM on content from wp_remote_open to find a string.
In order to use DOM, does it have to be enabled by my webhost? or do I include a DOM parsing script with the plugin?
I was thinking that if it needs to be enabled by the webhosting company, I would rather use a regular expression to find the string because then it would be compatible for everyone's installation.

DOM has nothing to do with your hosting provider or infrastructure. It is merely a model representing your HTML document. Most modern browsers support DOM. See more at the XML DOM introduction

Related

Tracing DOM element id back to its ExtJs component

We are develpoing a web-based automation solution for a web application that is built using ExtJs.
Currently i am testing various different object identification techniques that identify web elements in the best way.
We'd like to use the IE developer tools (F12) to highlight and select DOM objects on the page, and (somehow) get their corresponding ExtJs component (along with its corresponding properties, such as itemId).
Is this possible to do through code or through some other technique?
I am unfamiliar with IE Dev tools for such things, however I can attempt to answer targeting specific components and their elements.
You can target Ext components via several ways:
Ext.ComponentQuery.query(CQselector) method (see docs for examples)
Ext.getCmp(componentID) if you know component ID
up() and down() methods from any container/component. these also take CQselector expressions
Any of these methods are accessible from the page since Ext library is loaded. In browsers like FF and Chrome you can execute these methods directly from the console. I am guessing similarly they should be available in IE Dev tools.
Once you have reference to the Ext component you can get HTML elements through .dom or .el or similar properties. Or you could use Dom query directly.
I believe that if you set the id property rather than the itemId, you can achieve the desired result as this is passed through as the html id property of the top level container for the component (I think!). It's a little complicated to get that to work with accuracy though given the amount of nested divs/tables that are used in most of the extjs components. Good luck!
Hard to tell what you're looking for, but if you're trying to get a reference to an Ext.Component that is rendered, you can look for the wrapper node for your component in the HTML structure. The HTML id is the same as the component id. If you run var comp = Ext.getCmp('some-id-12345') and if that returns something, you've found the wrapper for an Ext.Component.
You could then use
comp.itemId
To retrieve the itemId
You should look into http://www.illuminations-for-developers.com/ A plugin for firebug that shows Ext.Components.
You can also use the Sencha Page Analyzer to see the entire component tree

dealing with itextsharp XMLWorkerHelper.ParseXHTML strict behavior

While trying to use XMLWorkerHelper.GetInstance().ParseXHTML() i find that it is really strict. Any wrong order of tags or unclosed tags will cause it to throw exception.
I am converting HTML that I have no control over.
Are there any flags to make it less strict? An input callback interface to handle funny markup? Anything in the itextsharp.tools.xml.html? Or an entirely new library compatible with itextsharp.text.IElement?
The name of the class and that method pretty much sums it up - you can't. The entire pipeline is based on the assumption that a valid XML document will be passed in, everything else will throw an exception. You can customize the pipeline and add your own handlers for things like link resolution, custom CSS properties and new HTML tags, but the core document processor still needs valid HTML.
I would recommend looking into running your HTML through a library that can convert it to XHTML.
EDIT
Also check out wkhtmltopdf. It uses webkit to render HTML and does (apparently) a pretty good job.
How to use wkhtmltopdf.exe in ASP.net
wkhtmltopdf.exe System.Security.SecurityException on cloud web server. How can i override server security policy
C# html to pdf converter using wkhtmltopdf or any other free tools

How do I visualise/pretty-print a HTML DOM tree?

Now that I can navigate a Web page via WWW::Mechanize and get information via HTML::TreeBuilder::XPath by accessing an id, I am left using Firebug to read the DOM in order to discover the layout of the HTML tree. The content that Mechanize captures is unstructured HTML, not good for human eyes.
Is using Firebug to ascertain the id I am after a typical approach? Once I get the id then I'm good to go, it's just that I've got several ids and pages with more ids to chase down and I was hoping to get (dump, print, etc.) a formatted layout of the DOM in order to make that discovery easier. Though granted, Firebug makes it pretty easy, too. I'm just wondering if I am missing an easier method.
Crossposted at PerlMonks.
If you need text, xmllint --html --format (comes with libxml2) does a decent job.
If you want a tree and mess with it and test out various expressions in a GUI, then Xacobeo is your new best friend.
Note: since both those tools rely on libxml, replace HTML::TreeBuilder::XPath with HTML::TreeBuilder::LibXML for compatibility. Evaluating XPath will be faster that way, too.
If you know Javascript/JQuery, then also install FireQuery. You can then test out CSS expressions in FireBug, and use them with modules that select HTML through CSS expressions, e.g. Web::Query.
I use XML Developer from Oxygen IDE for my recent development on XPath:
http://www.oxygenxml.com/download.html
It is a 30-day trial type of tool, but you can also search for XPath visualizer
It doesn't visualize a tree for you as far as I know (maybe there's a panel doing that). But it gives you some smart complete functionally that helps you to know what nodes you have available at any point. It is pretty big for XPath because it is hard to know where the parser pivot is really pointing at.

What is the real idea behind the concept of Document Object Model (DOM)?

I am beginner into HTML & HTML5.
As I was reading through the following link, i found the terms DOM and DOM API. I read through the Wikipedia, but was not able to digest the whole idea behind it.
Could somebody explain me :
the real idea behind the concept of Document Object Model (DOM)?
how is it related to HTML5?
Thanks,
Sen
From Wikipedia:
The Document Object Model (DOM) is a
cross-platform and
language-independent convention for
representing and interacting with
objects in HTML, XHTML and XML
documents
Simply put, it's how browsers (amongst other clients) represent web documents. The DOM is not specific to HTML5. It's been there from the get-go.
DOM API basically means how you, as a programmer, can interact with the DOM. Some examples might be adding elements to the DOM, changing their styles, and other common operations you would do on a web document.
In the context of HTML5, there are several additions to the DOM that didn't exist in previous versions of the HTML spec, such as <video> and <audio> elements.
The DOM is the browser's internal representation of the HTML document.
The DOM API is the way of programming the DOM, using JavaScript when in a browser.
HTML5 is just a new flavour of HTML. It uses the DOM in exactly the same way.
What Mark Pilgrim is saying is that there are certain things you can do with HTML5 DOM elements through the DOM API, such as start a video file playing. So, if you have a <video> DOM object in JavaScript, you can call its .play() method from JavaScript. This is an example of the DOM API.
The document object model is the browser's internal representation of HTML. It's based on the idea of 'children'. So a <p> tag might contain several text nodes and several <span> tags, like this:
<p><span>Hello,</span> this is some text. <span>It</span> is just a short paragraph</p>
This <p> tag has 4 children: two <span>s, and two text nodes (this is some text and is just a short paragraph). The other bits of text are children of their respective <span> tags.
The browser stores this information (instead of just storing a huge stream of HTML, which is very difficult to process) in its internal memory. This makes it much easier to format it using Cascading Style Sheets (CSS) and to make changes to it using JavaScript (create and delete parts, move parts from one parent to another, etc).
All versions of HTML (except perhaps very early ones) use the DOM. Each version has rules, such as which tags are valid, and which can be children to each element. These rules are implemented when processing the HTML and creating a DOM representation of it.
dom is the html representation of the programmed objects , each web page is a collection of DOM objects

GWT and templating engine

I want to design a website using GWT. This is my understanding of how GWT pages will be delivered to the client browser - When the user puts in the URL into her browser she receives all the static HTML + GWT javascript, and then the javascript queries the server for the dynamic page content and adds it to the DOM. eg - For a blog page the content of the blog is queried by the javascript. is my understanding correct?
If I know that the content will surely be a part of the page(add does not depend on user clicking an expand button etc.), Will it be more efficient if the blog content was a part of the HTML initially served? Something that could be done by using a templating engine like django.
Is there a way to make a templating mechanism in GWT?
Yes, putting your content into the HTML will reduce the number of round trips the client makes to your server. It also means that the blog content won't have to wait for your GWT javascript to load before it can be displayed.
GWT itself isn't useful for a template system, but most servers that run GWT servlets will also support JSP pages. GWT works fine with these pages, you just need to put the GWT script tag in as usual. You will no doubt be able to find a ready-made templating solution but rolling your own is not too hard.