Why are some of my tags being removed (GWT)? - gwt

I'm adding an element to a document with the following:
Element parent = getParentElement(); // Returns the right thing.
HTML html = new HTML();
html.setHTML( "<td>BLAH</td>" );
parent.appendChild( html.getElement() );
When I view the resulting document with FireBug though, the parent's child looks like this:
<div class="gwt-HTML"> BLAH </div>
I can use FireBug to add in the <td> elements manually, and all my formatting applies, etc. Does anyone know why the HTML element seems to be removing my <td> tags?

It turns out that it's FireFox that's stripping it out. If I just use plain old javascript to create a div, or a tr, and set innerHTML to be <td>BLAH</td>, it still gets stripped. A couple of others have noticed this as well: http://www.jtanium.com/2009/10/28/firefox-gotcha-innerhtml-strips-td-tags/
If I use javascript to create a <table> tag, and add it to the DOM, I can then place the <td> in that. Of course, it helpfully creates a <tbody><tr> for me as well, so I'm not really getting back what I put in....

Related

tinymce <span> gets removed when containing <br />

I'm using Tiny 4.9.10 to dynamically generate reports based on templates. Users can create templates which contain placeholders. These placeholders then get swapped out for their actual values when generating the actual report. The placeholders get their style (including font, which is the main issue here) from their enclosing <span>-tag.
When replacing the placeholder with their actual value, we use <br />-tags to insert new lines, since some of the placeholders are almost full reports on their own which need to be structured.
After the placeholders have all been replaced, we inject this dynamically generated content back into a Tiny editor, so as to allow users to make ad hoc changes to the content.
At this point however we noticed that the <span>-tag around a piece of generated content containing <br />-tags gets removed. This is a problem, because the style info that was enclosed in this tag gets removed as well, resulting in problems further down the line when generating a PDF.
What I've tried to work around this:
setting verify_html to false
adding +span[br]/+span[br /] to valid_children
setting forced_root_bloc to div
The first two options did nothing to help me, and while the last one looked promising, it didn't help, because even when using <div>, font info gets enclosed into a child <span>.
I know this is expected behavior, because <span> is an inline tag and so it shouldn't have <br /> tags as children, but I'm currently at a loss for a workaround which allows me to include <br /> tags into my dynamically generated content without losing the style (most importantly the font) of the parent tag.
So I solved this by replacing the <span> tags by <div> tags when we swap out the placeholders by using some regex looking for spans that enclose a <p>...<\p> or a <b />. This stops Tiny from throwing away the <span> tags when they contain either of these enclosed tags
TinyMCE considers the <span> <br /> </span> construct an empty space and deletes it in favor of optimization.
I may be late, but you can also try using this callback in the setup option to stop the editor from removing empty spans:
setup: function(editor) {
editor.on('PreInit', function() {
editor.schema.getElementRule('span').removeEmpty = false;
});
}

In Tritium, how do I transform all <p> tags to <div> tags?

I’m working in the Moovweb SDK and am optimizing my personal desktop site for mobile.
How do I transform all my <p> tags to <div> tags? I really don't want to do it manually! Search and replace?? haha
You can use the name() function to change the name of an element. For example:
$("//p") {
name("div")
}
See it in action here: http://tester.tritium.io/bd1be4f2c187aed317351688e23f01127d26343a
Cheap way: Add p{margin:0} to your CSS, this will remove the only special styling of <p> tags making them look like <div>s.
This is only a visual effect, though. For instance, you're still not allowed to put a <form> inside a <p>, even with the above CSS. If that's what you're after, a simple search and replace will do:
Replace <p> with <div>
Replace <p␣ (left angle, p, space) with <div␣ (there's a space at the end of that one too)
Replace </p> with </div>
That should do it!

Trouble pinpointing child elements while using Mojo::DOM

I'm trying to extract text from an old vBulletin forum using WWW::Mechanize and Mojo::DOM.
vBulletin doesn't use HTML and CSS for semantic markup, and I'm having trouble using Mojo::DOM->children to get at certain elements.
These vBulletin posts are structured differently depending on their content.
Single message:
<div id="postid_12345">The quick brown fox jumps over the lazy dog.<div>
Single message quoting another user:
<div id="postid_12345">
<div>
<table>
<tr>
<td>
<div>Quote originally posted by Bob</div>
<div>Everyone knows the sky is blue.</div>
</td>
</tr>
</table>
</div>
I disagree with you, Bob. It's obviously green.
</div>
Single message with spoilers:
<div id="postid_12345">
<div class="spoiler">Yoda is Luke's father!</div>
</div>
Single message quoting another user, with spoilers:
<div id="postid_12345">
<div>
<table>
<tr>
<td>
<div>Quote originally posted by Fred</div>
<div class="spoiler">Yoda is Luke's father!</div>
</td>
</tr>
</table>
</div>
<div class="spoiler">No waaaaay!</div>
</div>
Assuming the above HTML and an array packed with the necessary post IDs:
for (#post_ids) {
$mech->get($full_url_of_specific_forum_post);
my $dom = Mojo::DOM->new($mech->content);
my $div_id = 'postid_' . $_;
say $dom->at($div_id)->children('div')->first;
say $dom->at($div_id)->text;
}
Using $dom->at($div_id)->all_text gives me everything in an unbroken line, which makes it difficult to tell what's quoted and what's original in the post.
Using $dom->at($div_id)->text skips all of the child elements, so quoted text and spoilers are not picked up.
I've tried variations of $dom->at($div_id)->children('div')->first, but this gives me everything, including the HTML.
Ideally, I'd like to be able to pick up all the text in each post, with each child element on its own line, e.g.
POSTID12345:
+ Quote originally posted by Bob
+ Everyone knows the sky is blue.
I disagree with you, Bob. It's obviously green.
I'm new to Mojo and rusty with Perl. I wanted to solve this on my own, but after looking over the documentation and fiddling with it for a few hours, my brain is mush and I'm at a loss. I'm just not getting how Mojo::DOM and Mojo::Collections work.
Any help will be greatly appreciated.
Looking at the source of Mojo::DOM, basically the all_text method recursively walks the DOM and extracts all text. Use that source to write your own walking the DOM function. Its recursive function depends on returning a single string, in yours you might have it return an array with whatever context you need.
EDIT:
After some discussion on IRC, the web scraping example has been updated, it might help you guide you. http://mojolicio.us/perldoc/Mojolicious/Guides/Cookbook#Web_scraping
There is a module to flattern HTML tree, HTML::Linear.
The explanation of purpose for flatterning HTML tree is a bit long and boring, so here's a picture showing the output of the xpathify tool, bound with that module:
As you see, HTML tree nodes become single key/value list, where the key is the XPath for that node, and the value is the node's text attribute.
In a few keystrokes, this is how you use HTML::Linear:
#!/usr/bin/env perl
use strict;
use utf8;
use warnings;
use Data::Printer;
use HTML::Linear;
my $hl = HTML::Linear->new;
$hl->parse_file(q(vboard.html));
for my $el ($hl->as_list) {
my $hash = $el->as_hash;
next unless keys %{$hash};
p $hash;
}

Can I append an Ajax requestXML object to my document tree all in one go?

Greetings.
Here is an XML object returned by my server in the responseXML object:
<tableRoot>
<table>
<caption>howdy!</caption>
<tr>
<td>hello</td>
<td>world</td>
</tr>
<tr>
<td>another</td>
<td>line</td>
</tr>
</table>
Now I attach this fragment to my document tree like so:
getElementById('entryPoint').appendChild(responseXML.firstChild.firstChild);
But instead of being rendered as a table, I get the following text:
howdy! helloworldanotherline
The same result occurs of I replace firstChild.firstChild with just firstChild.
It seems like I'm just getting the nodeValues, and all of the tags are stripped out?!
Am I fundamentally misunderstanding what the responseXML object is supposed to represent?
This works, BTW, if I take out the 'root' tags, and set innerHTML to responseText.
Can someone please enlighten me on the correct way to use responseXML?
You get the text instead of a table, because you use pure DOM for manipulations and your response XML doesn't have the namespaces declarations. So when appending an XML element browser doesn't know whether your "table" tag is from HTML, XUL, SVG or else from.
1) Add namespace declaration:
<table xmlns="http://www.w3.org/1999/xhtml">
2) Instead of directly inserting a reffered XML DOM Element, you should first import that node into your HTML DOM Document:
var element = document.importNode(responseXML.firstChild.firstChild, true);
document.getElementById('entryPoint').appendChild(element);
Hope this helps!
You can create an element at the position you want to insert and than do
element.innerHTML = request.responseText

php - splitting a string with HTML by the first instance of a table cell

I am checking on HTML content on my page, and I've got the split down to have the variable left with this content:
">
<td>Oklahoma City</td>
<td>Oklahoma</td>
<td>OK</td>
<td>405</td>
<td>CST</td>
</tr>
</table>
<div id="
Those are dynamic pages I'm checking, so the data will always be different, but the layout the same...
How can I get the value out of the second <td> if that html is in 1 variable(string)?
It was a full page, I've used explode twice to remove everything above a div field and everything below the last dive field id... so it has some open html tags left because I did not know how to get rid of that along the way to be left with just this:
<td>Oklahoma City</td>
<td>Oklahoma</td>
<td>OK</td>
<td>405</td>
<td>CST</td>
</tr>
</table>
Can you tell me how to get that out? I just need the second one because it is the county and that is what I'm checking on...